MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
192 stars 16 forks source link

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. #23

Open alifarhan357 opened 5 months ago

alifarhan357 commented 5 months ago

from bertopic import BERTopic

Define seed words for topics

seed_words = [ ['software', 'programming', 'Python', 'Java', 'machine learning', 'data visualization'], ['project management', 'leadership'], ['healthcare', 'medical research', 'patient care', 'disease prevention'] ]

Sample documents (text data)

documents = [ "This is about software development and programming languages like Python and Java.", "Finance and banking are important topics in the economy.", "Project management and leadership skills are essential for success.", "Healthcare and medical research focus on patient care and disease prevention." ]

Initialize BERTopic model with seed_topic_list

model = BERTopic(seed_topic_list=seed_words)

Fit and transform documents to obtain topics and probabilities

topics, probabilities = model.fit_transform(documents)

Display the assigned topics for each document

for i, (doc, topic) in enumerate(zip(documents, topics)): print(f"Document {i+1}: Topic {topic} - '{doc}'")

Error

/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1600: RuntimeWarning: k >= N for N N square matrix. Attempting to use scipy.linalg.eigh instead. warnings.warn("k >= N for N N square matrix. " /usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1600: RuntimeWarning: k >= N for N N square matrix. Attempting to use scipy.linalg.eigh instead. warnings.warn("k >= N for N N square matrix. "

TypeError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py in _reduce_dimensionality(self, embeddings, y, partial_fit) 3471 y = np.array(y) if y is not None else None -> 3472 self.umap_model.fit(embeddings, y=y) 3473 except TypeError:

14 frames TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode) 1603 1604 if issparse(A): -> 1605 raise TypeError("Cannot use scipy.linalg.eigh for sparse A with " 1606 "k >= N. Use scipy.linalg.eigh(A.toarray()) or" 1607 " reduce k.")

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

MaartenGr commented 5 months ago

Thank you for sharing. This is a BERTopic issue and not a Concept issue, so I would advise you to check the issues page of BERTopic instead. I believe you can also find some temporary solutions there until a fix is released.