Open damosuzuki opened 1 year ago
That is indeed quite the difference! I had updated the underlying algorithm of nr_topics
in order to prevent any topics to be merged in the outliers and was quite happy with the results but this seems to show something entirely different. I will test this a bit more in detail to see if the same thing happens with other datasets. If so, then it might be a bug or I might simply revert it to the old algorithm.
I have noticed a reduction in the quality of topic modeling in 0.14.0 when specifying the nr_topics parameter.
Here is my test script:
With bertopic==0.13.0:
And with bertopic==0.14.0: