Open daianacric95 opened 11 months ago
That is a result of how MMR works! In practice, it tries to diversify a number of words, for instance, 30, to a lower value, for instance 10. This means that when you set top_n_words
to 10, it will only give 10 keywords to MMR. Diversifying 10 keywords into 10 keywords means it will not do anything. In other words, in BERTopic, set top_n_words
to a higher value, like 30, and it can actually diversify a set of words into a smaller set of words.
Hi Maarten,
Thank you once again for this amazing package. I used it for my master's thesis and several projects for my job at the university, and it's a lifesaver compared to other topic modeling techniques I tried.
That being said, I have run the model on about 200k tweets and many topics have quite a lot of repeating words. I have used the following code to add a representation model with different parameters (starting from .3 to .9) but the results are still the same.
Here are a couple of examples:
Not only the results are the same, but plurals such as australia and australians are not merged. Could you please guide me on how to move further?