MasslessAI / narratelab

0 stars 0 forks source link

hdbscan clustering error with Invalid shape in axis 0: 0 #4

Closed timothywangdev closed 3 years ago

timothywangdev commented 3 years ago
Traceback (most recent call last):
  File "test.py", line 187, in <module>
    topics, probabilities = topic_model.fit_transform(_docs)
  File "/home/hehe/projects/BERTopic/bertopic/_bertopic.py", line 296, in fit_transform
    documents = self._reduce_topics(documents)
  File "/home/hehe/projects/BERTopic/bertopic/_bertopic.py", line 1514, in _reduce_topics
    documents = self._auto_reduce_topics(documents)
  File "/home/hehe/projects/BERTopic/bertopic/_bertopic.py", line 1597, in _auto_reduce_topics
    predictions = hdbscan.HDBSCAN(min_cluster_size=2,
  File "/home/hehe/projects/BERTopic/env/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 941, in fit_predict
    self.fit(X)
  File "/home/hehe/projects/BERTopic/env/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 919, in fit
    self._min_spanning_tree) = hdbscan(X, **kwargs)
  File "/home/hehe/projects/BERTopic/env/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 605, in hdbscan
    (single_linkage_tree, result_min_span_tree) = memory.cache(
  File "/home/hehe/projects/BERTopic/env/lib/python3.8/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/home/hehe/projects/BERTopic/env/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 206, in _hdbscan_prims_kdtree
    min_spanning_tree = mst_linkage_core_vector(X, core_distances, dist_metric,
  File "hdbscan/_hdbscan_linkage.pyx", line 55, in hdbscan._hdbscan_linkage.mst_linkage_core_vector
  File "hdbscan/_hdbscan_linkage.pyx", line 100, in hdbscan._hdbscan_linkage.mst_linkage_core_vector
  File "stringsource", line 251, in View.MemoryView.array_cwrapper
  File "stringsource", line 153, in View.MemoryView.array.__cinit__
ValueError: Invalid shape in axis 0: 0.
timothywangdev commented 3 years ago

Fixed with https://github.com/MasslessAI/BERTopic/commit/fc2cd06bfbac4ed3487afc0ce0e69030c380c6b5

timothywangdev commented 3 years ago

when #topics <=2, try reduce min_cluster_size to 4