When computing outliers on the heavy dns data, we get a crash in tSNE
File "/Users/briford/work/sageworks/src/sageworks/artifacts/data_sources/athena_source.py", line 392, in outliers
projection = TSNE(perplexity=perplexity).fit_transform(outlier_df[outlier_features])
File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/manifold/_t_sne.py", line 1119, in fit_transform
embedding = self._fit(X)
File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/manifold/_t_sne.py", line 995, in _fit
X_embedded = pca.fit_transform(X).astype(np.float32, copy=False)
File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 462, in fit_transform
U, S, Vt = self._fit(X)
File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 514, in _fit
return self._fit_truncated(X, n_components, self._fit_svd_solver)
File "/Users/briford/.pyenv/versions/py310/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 587, in _fit_truncated
raise ValueError(
ValueError: n_components=2 must be between 1 and min(n_samples, n_features)=1 with svd_solver='randomized'
Okay, so we can work around this by using project=False when calling ds.outliers(project=False)... I'm going to toss this back into the backlog for now.
When computing outliers on the heavy dns data, we get a crash in tSNE