I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:
This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.
I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:
What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?
I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:![image](https://github.com/bab2min/tomotopy/assets/133242553/4a5154ab-282e-4585-8840-6cbf493d44db)
This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.
I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:![image](https://github.com/bab2min/tomotopy/assets/133242553/bb4dd989-c5b2-4e41-9b90-1603f4169a47)
What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?
Thank you!