MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.76k stars 716 forks source link

Scikit-learn's HDBSCAN Implementation #2031

Open MaartenGr opened 4 weeks ago

MaartenGr commented 4 weeks ago

In a recent version of scikit-learn, I believe it was v1.3, HDBSCAN was implemented with base functionality. Considering scikit-learn is already a requirement of BERTopic it stands to reason to use that implementation instead of the original implementation since scikit-learn has more contributors. Moreover, common installation issues related to HDBSCAN might be alleviated with this.

There are a couple of issues worth mentioning:

For those reading this, I'm interested to hear what you all think about this suggested change!