MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.07k stars 757 forks source link

Is there a way to find emerging topics on regular basis? #1514

Open imprateekagarwal opened 1 year ago

imprateekagarwal commented 1 year ago

I have yearly text data and want to find new topics (emerging) every month which were not there last month. Also, would like to check trend of these emerging topics for next month.

MaartenGr commented 1 year ago

Generally, you would use online topic modeling for that. This method can consistently find new topics as they appear.

If you want to use HDBSCAN instead, you can do the following:

benearnthof commented 11 months ago

Hi, I'm interested in this issue, Could you elaborate on the steps you outlined above? I'm currently trying to compare three approaches: An online model, a dynamic model and a manual model to compare old and newly assigned topics, but I don't quite get how I could validate the performance or "correctness" of these models. Is there a way to compare such approaches?

MaartenGr commented 11 months ago

but I don't quite get how I could validate the performance or "correctness" of these models. Is there a way to compare such approaches?

With respect to validation, there is not a single method that will work across all use cases. What you define as "performance" or "correctness" can be quite subjective due to the nature of topic modeling. There are metrics to be used, such as coherence and cluster-specific metrics, but that all depends on the exact thing you are trying to achieve.

Another method than the ones mentioned above is merging different topic models together to detect the potential appearance of new topics. You can find more about that here which is integrated in the main branch and will be officially released in a couple of weeks.

lukasmackin commented 11 months ago

Another method than the ones mentioned above is merging different topic models together to detect the potential appearance of new topics. You can find more about that https://github.com/MaartenGr/BERTopic/pull/1516 which is integrated in the main branch and will be officially released in a couple of weeks.

@MaartenGr: I'm very interested in using this! Do you have a firmer idea on when this may be officially released?

MaartenGr commented 11 months ago

@Macdaddy24 I can't give you a specific date but I intend to release the new version this month. As the package grows, so does the complexity and I want to make sure there is also sufficient documentation to account for the many use cases.

lukasmackin commented 11 months ago

@MaartenGr: Sounds good! I appreciate your dedication to detail.