Closed vldbnc closed 2 years ago
the issue was resolved with hp['algorithm] = 'generic' -> 'best'
Glad to hear that you got the issue resolved! If you run into any other questions, let me know and I'll make sure to help out wherever I can.
thanks @MaartenGr for your support!
bertopic == 0.11.0-py2.py3-none-any.whl
hp = {'algorithm': 'generic', 'epsilon': 0.09, 'min_samples': 1, 'min_cluster_size': 5}
clustering_model = hdbscan.HDBSCAN(algorithm=hp['algorithm'], cluster_selection_epsilon=hp['epsilon'], core_dist_n_jobs=-1, min_samples=hp['min_samples'], prediction_data=True, min_cluster_size=hp['min_cluster_size'])
sentence_model = SentenceTransformer('bert-base-uncased') sentence_model.max_seq_length=256
topics_model = BERTopic(embedding_model=sentence_model, hdbscan_model=clustering_model)
`Traceback (most recent call last): File "./get_hdbscan_clustering_withtopics.py", line 121, in
, _ = topics_model.fit_transform(df_text['MSG_CLEAN'].tolist())
File "/opt/conda/lib/python3.8/site-packages/bertopic/_bertopic.py", line 316, in fit_transform
documents, probabilities = self._cluster_embeddings(umap_embeddings, documents)
File "/opt/conda/lib/python3.8/site-packages/bertopic/_bertopic.py", line 2090, in _cluster_embeddings
self.hdbscan_model.fit(umapembeddings)
File "/opt/conda/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 1189, in fit
) = hdbscan(cleandata, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 740, in hdbscan
(single_linkage_tree, result_min_spantree) = memory.cache(
File "/opt/conda/lib/python3.8/site-packages/joblib/memory.py", line 349, in call
return self.func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 117, in _hdbscan_generic
min_spanning_tree = mst_linkage_core(mutualreachability)
File "hdbscan/_hdbscan_linkage.pyx", line 15, in hdbscan._hdbscan_linkage.mst_linkage_core
ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float'`