MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.83k stars 724 forks source link

Can't reproduce same results when using cuml version of UMAP and HDBSCAN #1994

Open chentitus opened 2 months ago

chentitus commented 2 months ago

Hi,

I understand that when I import UMAP from umap, and HDBSCAN from hdbscan, I can reproduce the results of topic modeling by setting random_state in UMAP.

But I realized that if I import HDBSCAN from cuml.cluster, and UMAP from cuml.manifold, then the results of topic modeling can no longer be replicated even when I set random_state in UMAP.

This is done on the Colab platform, and I upgrade BERTopic to 0.16.2.

Any ideas on how I can reproduce topic modeling results using cuml? Thanks much!

MaartenGr commented 2 months ago

Thanks for reaching out! I'm not entirely sure how to do that with those models. I would advise posting an issue on the cuml board, they are generally quick to respond and eager to help out!

chentitus commented 2 months ago

@MaartenGr Do you mean that I post a question at https://github.com/rapidsai/cuml?

Thanks much for the advice!

MaartenGr commented 2 months ago

@MaartenGr Do you mean that I post a question at https://github.com/rapidsai/cuml?

Yes! They have much more expertise than I have on this subject.