Closed ValeriiBaidin closed 4 years ago
Hi Valerii,
So if I understand your question correctly, you're asking why the topics don't contain all the numbers.
Each topic contains a ranking of all the terms in the vocabulary. The code showtopics(model, cols=2, 3)
means that you are only showing the top three terms for each topic. If you would like to view the full term ranking for each topic, then you can write,
showtopics(model, cols=2, 7)
As for comparisons with other topic modeling packages, I have not, however the implementations are standard coordinate-ascent variational inference. Original algorithms may be found in the bibliography.
Hi Valerii,
So if I understand your question correctly, you're asking why the topics don't contain all the numbers.
Each topic contains a ranking of all the terms in the vocabulary. The code
showtopics(model, cols=2, 3)
means that you are only showing the top three terms for each topic. If you would like to view the full term ranking for each topic, then you can write,showtopics(model, cols=2, 7)
As for comparisons with other topic modeling packages, I have not, however the implementations are standard coordinate-ascent variational inference. Original algorithms may be found in the bibliography.
From the data, there is 2 topics: (1,2,3) and (4,5,6). I don't understand, why topics 2 is (7,6,5)
Ah I see what you're asking. So strictly speaking your corpus has three topics, not two: (1,2,3), (4,5,6) and (7).
So topic 2 ends up having to merge the (4,5,6) and (7) topics together, and (7) ends up above (4,5,6), probably because 7 occurs with the most frequency.
If you try setting model=LDA(c, 3)
, you may obtain more sensible results. However even with three topics, depending on how the topic weights are randomly initialized, the algorithm may get trapped in a poor local optima.
I am sorry to bother you, I've just check very simple an example. The result seems strange.
The Result is
It is strange, that topic 2 doesn't contain 4 and contains 7.
Would check, is it correct.
thank you in advance.
P.S. Have you compare your results with other realizations.
P.S.S. Thank you so much for your code.