MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
734 stars 106 forks source link

Bug OOV words in WECoherenceCentroid #45

Closed rsimd closed 2 years ago

rsimd commented 3 years ago

OCTIS version: 1.10.0 Python version :3.9.7 Operating System: DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"

Description

In this line (https://github.com/MIND-Lab/OCTIS/blob/master/octis/evaluation_metrics/coherence_metrics.py#L180 ), topic[0] contains a word, so if this is a word that is not included in self._wv, it will cause an error.

Since Gensim's KeyedVectors class has a vector_size variable, I think this code should be rewritten to create a zero vector with reference to vector_size.

#t = [0] * len(self._wv.__getitem__(topic[0]))
t = np.zeros(self._wv.vector_size)

Examples of error messages

  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/octis-1.10.0-py3.9.egg/octis/evaluation_metrics/coherence_metrics.py", line 180, in score
    t = [0] * len(self._wv.__getitem__(topic[0]))
  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 395, in __getitem__
    return self.get_vector(key_or_keys)
  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 438, in get_vector
    index = self.get_index(key)
  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 412, in get_index
    raise KeyError(f"Key '{key}' not present")
KeyError: "Key 'elsevi' not present"
silviatti commented 2 years ago

Hello, thanks for reporting this issue and for your patience. I'm working on it and I will fix this by tomorrow.

Silvia