SuperCowPowers / sageworks

SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models
https://www.supercowpowers.com
MIT License
39 stars 1 forks source link

tSNE crash on heavy dns data #284

Open brifordwylie opened 1 year ago

brifordwylie commented 1 year ago

When computing outliers on the heavy dns data, we get a crash in tSNE

  File "/Users/briford/work/sageworks/src/sageworks/artifacts/data_sources/athena_source.py", line 392, in outliers
    projection = TSNE(perplexity=perplexity).fit_transform(outlier_df[outlier_features])
  File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/manifold/_t_sne.py", line 1119, in fit_transform
    embedding = self._fit(X)
  File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/manifold/_t_sne.py", line 995, in _fit
    X_embedded = pca.fit_transform(X).astype(np.float32, copy=False)
  File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 462, in fit_transform
    U, S, Vt = self._fit(X)
  File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 514, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
  File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 587, in _fit_truncated
    raise ValueError(
ValueError: n_components=2 must be between 1 and min(n_samples, n_features)=1 with svd_solver='randomized'
brifordwylie commented 1 year ago

Okay, so we can work around this by using project=False when calling ds.outliers(project=False)... I'm going to toss this back into the backlog for now.