Open Garren87 opened 5 months ago
Well, i have tested LDAmodel, and all the coherence measurements work well. What's more, even the result of pyLDAvis turns into clear and meaningful. Does this mean that my corpus are not suitable for DTM, or it still has some problems?
This is a great project which helps a lot. I am using DTM on a set of abstracts of english scientific papers (about 60000, spanning from 2000 to 2024) on the same topic: Electrochemical Energy Storage. I am trying to decide the optimal topic number K based on common indicators like coherence and perplexity. However, seems that all the coherence measurements (which are provided by
tp.coherence.Coherence().get_score()
) are negative, including c_npmi, c_uci, u_mass. Besides, c_v seems to be working, but other users mentioned that there are also problems within c_v. By the way, the results I got with pyLDAvis were also not good, with a large overlap between topics. I have tried many changes, including different k from 2 to 100, different parameters setting such as timepoint , rm_top and min_df, but the result did not improve. Does this mean that there is a problem with my corpus? P.S. there is an error with DTM training when k=1, gotProcess finished with exit code -1073741819 (0xC0000005)