Open fishfree opened 4 months ago
To get more topics, you would need to decrease the value of min_topic_size
. The higher the minimum topic size, the fewer topics it can create. I would suggest reading through the best practices for more on this or use a different clustering model like k-Means that allows you to manually select the number of topics.
It is recommended to use K-means as the clustering algorithm. When I am doing Chinese topic modeling, the kmeas algorithm will be much better.
@rap8 Thank you for your sharing. Would you please share your code snippet?
It has only 3 topics as below, much less than the Mallet tool.
I almost test every hyper parameters here, at last found the
n_neighbors
in theUMAP
function works most apparently, however, even changing it from 15 to 50 only adds 1 new topic as below:It seems BERTopic need more tuning parameters for Chinese or even CJK texts. Can anyone share some experience, please?