Closed dkltimon closed 4 years ago
The behavior of NPMI is not exactly defined for this case. The classical computation would give you NaN
. However, in Palmetto, we have an additional check for the probabilities before we calculate NPMI. If one of the two words has a 0.0 probability, we set the NPMI to 0
.
This behavior can be adapted by giving -1
in the constructor of the NPMI calculation. However, it is arguable whether this is better than 0. At least from my point of view, the 0 reflects the actual situation (i.e., the system simply has no information about the terms of the topic) better than the -1.
If you adapt it (in your local Palmetto instance), I would suggest to document this in your later publication, report or whatever you may use the numbers for :wink:
Thank you very much for your help!
Hi,
I have a question regarding the calculation of topic coherence (for example NPMI).
Let's say I have a topic of five very rare words. None of them occur in my reference corpus (Wikipedia). What result will I get? 0? If the result is 0, it doesn't reflect the true interpretability of this topic, isn't it? Because NPMI does have negative values, which indicate that a topic is not very interpretable.