adjidieng / ETM

Topic Modeling in Embedding Spaces
MIT License
538 stars 126 forks source link

Negative coherence on short texts #29

Open elbadma opened 3 years ago

elbadma commented 3 years ago

Hi, I saw that one can use DETM on short texts. I tried ETM on short texts (each text contains only one sentence) and it seemed to work. However, the coherence score became negative. How should I interpret it? Does lower coherence always mean worse? Or do scores closer to 0 mean worse? Whenever I try ETM on normal-length texts (consisting of more than one sentence), the coherence is always positive, so I assume that negative coherence is caused by short length

silviatti commented 3 years ago

Hi! Coherence is computed as the normalized pointwise mutual information, which ranges between -1 and 1. That means scores lower than 0 are fine. It usually happens in the case of short-text documents because documents are much sparser and words co-occur less frequently. Just make sure not to compare coherences computed on two different datasets.