Open imprateekagarwal opened 1 year ago
Generally, you would use online topic modeling for that. This method can consistently find new topics as they appear.
If you want to use HDBSCAN instead, you can do the following:
Hi, I'm interested in this issue, Could you elaborate on the steps you outlined above? I'm currently trying to compare three approaches: An online model, a dynamic model and a manual model to compare old and newly assigned topics, but I don't quite get how I could validate the performance or "correctness" of these models. Is there a way to compare such approaches?
but I don't quite get how I could validate the performance or "correctness" of these models. Is there a way to compare such approaches?
With respect to validation, there is not a single method that will work across all use cases. What you define as "performance" or "correctness" can be quite subjective due to the nature of topic modeling. There are metrics to be used, such as coherence and cluster-specific metrics, but that all depends on the exact thing you are trying to achieve.
Another method than the ones mentioned above is merging different topic models together to detect the potential appearance of new topics. You can find more about that here which is integrated in the main branch and will be officially released in a couple of weeks.
Another method than the ones mentioned above is merging different topic models together to detect the potential appearance of new topics. You can find more about that https://github.com/MaartenGr/BERTopic/pull/1516 which is integrated in the main branch and will be officially released in a couple of weeks.
@MaartenGr: I'm very interested in using this! Do you have a firmer idea on when this may be officially released?
@Macdaddy24 I can't give you a specific date but I intend to release the new version this month. As the package grows, so does the complexity and I want to make sure there is also sufficient documentation to account for the many use cases.
@MaartenGr: Sounds good! I appreciate your dedication to detail.
I have yearly text data and want to find new topics (emerging) every month which were not there last month. Also, would like to check trend of these emerging topics for next month.