Jayman391 / lnlp

MIT License
0 stars 0 forks source link

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. #26

Open Jayman391 opened 5 months ago

Jayman391 commented 5 months ago

(nllp) MacBook-Air-3:lnlp user$ python main.py tests/test_data/data.csv

Welcome to the NLLP CLI! Loaded data from tests/test_data/data.csv

  1. Run a Topic Model
  2. Run an Optimization routine for a Topic Model (GPU reccomended)
  3. Run a Classification Model
  4. Load Global Configuration Files
  5. Exit Choose an option: 1
  6. Select LLM to generate Embeddings
  7. Select Dimensionality Reduction Technique
  8. Select Clustering Technique
  9. Fine Tuning
  10. Plotting
  11. Save Topic Model Configuration
  12. Load Topic Model Configuration
  13. Save Session Data
  14. Run Topic Model
  15. Back
  16. Exit Choose an option: 8 Please enter the path of the directory to save this sessions data: tests
  17. Select LLM to generate Embeddings
  18. Select Dimensionality Reduction Technique
  19. Select Clustering Technique
  20. Fine Tuning
  21. Plotting
  22. Save Topic Model Configuration
  23. Load Topic Model Configuration
  24. Save Session Data
  25. Run Topic Model
  26. Back
  27. Exit Choose an option: 9 2024-03-25 15:44:48,448 - BERTopic - Embedding - Transforming documents to embeddings. Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:43<00:00, 1.45it/s] 2024-03-25 15:45:31,909 - BERTopic - Embedding - Completed ✓ 2024-03-25 15:45:31,909 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm UMAP( verbose=True) Mon Mar 25 15:45:31 2024 Construct fuzzy simplicial set Mon Mar 25 15:45:33 2024 Finding Nearest Neighbors Mon Mar 25 15:45:34 2024 Finished Nearest Neighbor Search Mon Mar 25 15:45:34 2024 Construct embedding Epochs completed: 0%| 0/500 [00:00]completed 0 / 500 epochs Epochs completed: 0%| ▎ 1/500 [00:00]completed 50 / 500 epochs Epochs completed: 19%| █████████████████████████████████▋ 94/500 [00:00]completed 100 / 500 epochs completed 150 / 500 epochs Epochs completed: 37%| ██████████████████████████████████████████████████████████████████▏ 186/500 [00:00]completed 200 / 500 epochs completed 250 / 500 epochs Epochs completed: 55%| ██████████████████████████████████████████████████████████████████████████████████████████████████▎ 276/500 [00:01]completed 300 / 500 epochs completed 350 / 500 epochs Epochs completed: 73%| ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ 367/500 [00:01]completed 400 / 500 epochs completed 450 / 500 epochs Epochs completed: 100%| ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 500/500 [00:01] Mon Mar 25 15:45:36 2024 Finished embedding 2024-03-25 15:45:36,318 - BERTopic - Dimensionality - Completed ✓ 2024-03-25 15:45:36,318 - BERTopic - Cluster - Start clustering the reduced embeddings 2024-03-25 15:45:36,343 - BERTopic - Cluster - Completed ✓ 2024-03-25 15:45:36,346 - BERTopic - Representation - Extracting topics from clusters using representation models. 2024-03-25 15:45:36,454 - BERTopic - Representation - Completed ✓ No plotting options selected. Visualizing all topics, documents, and terms. /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/spectral.py:521: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead. eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh( Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. An error occurred. Please try again. Would you like to see the error trace? (y/n): y Traceback (most recent call last): File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 162, in _visualize_topics topic_viz = model.visualize_topics() File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 2249, in visualize_topics File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/plotting/_topics.py", line 79, in visualize_topics File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umaplearn-0.5.5-py3.9.egg/umap/umap.py", line 2887, in fit_transform self.fit(X, y, force_all_finite) File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umaplearn-0.5.5-py3.9.egg/umap/umap.py", line 2780, in fit self.embedding_, aux_data = self._fit_embed_data( File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umaplearn-0.5.5-py3.9.egg/umap/umap.py", line 2826, in _fit_embed_data return simplicial_set_embedding( File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umaplearn-0.5.5-py3.9.egg/umap/umap.py", line 1106, in simplicial_set_embedding embedding = spectral_layout( File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/spectral.py", line 304, in spectral_layout return _spectral_layout( File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/spectral.py", line 521, in _spectral_layout eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh( File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1608, in eigsh raise TypeError("Cannot use scipy.linalg.eigh for sparse A with " TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.
Jayman391 commented 5 months ago

pretty sure this has the ability to happen when num features < num data points (So at basically any point?) To fix I would either -Re embed data and try again -pca the embeddings and then do UMAP