MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.12k stars 763 forks source link

providing hdbscan_clusterer caused exception #621

Closed vldbnc closed 2 years ago

vldbnc commented 2 years ago

bertopic == 0.11.0-py2.py3-none-any.whl

hp = {'algorithm': 'generic', 'epsilon': 0.09, 'min_samples': 1, 'min_cluster_size': 5}

clustering_model = hdbscan.HDBSCAN(algorithm=hp['algorithm'], cluster_selection_epsilon=hp['epsilon'], core_dist_n_jobs=-1, min_samples=hp['min_samples'], prediction_data=True, min_cluster_size=hp['min_cluster_size'])

sentence_model = SentenceTransformer('bert-base-uncased') sentence_model.max_seq_length=256

topics_model = BERTopic(embedding_model=sentence_model, hdbscan_model=clustering_model)

`Traceback (most recent call last): File "./get_hdbscan_clustering_withtopics.py", line 121, in , _ = topics_model.fit_transform(df_text['MSG_CLEAN'].tolist()) File "/opt/conda/lib/python3.8/site-packages/bertopic/_bertopic.py", line 316, in fit_transform documents, probabilities = self._cluster_embeddings(umap_embeddings, documents) File "/opt/conda/lib/python3.8/site-packages/bertopic/_bertopic.py", line 2090, in _cluster_embeddings self.hdbscan_model.fit(umapembeddings) File "/opt/conda/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 1189, in fit ) = hdbscan(cleandata, **kwargs) File "/opt/conda/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 740, in hdbscan (single_linkage_tree, result_min_spantree) = memory.cache( File "/opt/conda/lib/python3.8/site-packages/joblib/memory.py", line 349, in call return self.func(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 117, in _hdbscan_generic min_spanning_tree = mst_linkage_core(mutualreachability) File "hdbscan/_hdbscan_linkage.pyx", line 15, in hdbscan._hdbscan_linkage.mst_linkage_core

ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float'`

vldbnc commented 2 years ago

the issue was resolved with hp['algorithm] = 'generic' -> 'best'

MaartenGr commented 2 years ago

Glad to hear that you got the issue resolved! If you run into any other questions, let me know and I'll make sure to help out wherever I can.

vldbnc commented 2 years ago

thanks @MaartenGr for your support!