MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
734 stars 106 forks source link

The `python` and `scipy` version-compatibility, and KLDivergence() needs attention! #125

Open prikarsartam opened 5 months ago

prikarsartam commented 5 months ago

I am not an expert in package management so I do not fully understand all the details of it. octis installs properly in google colab, but installing in kaggle requires pip install octis --use-pep517.

Now installing locally on my system I had the following issue - both for installing with pip install octis and pip install -e. from the downloaded repository which is of prior concern to me.

Description

  1. Installing with the latest python3.12 in my linux doesn't end successfully in any case as zipimport has been deprecated from Python3.10 onwards.
  2. Since this repo requires gensim==4.2.0 it has image inside gensim/matutils.py but to the best of my knowledge the triu has been deprecated for scipy==1.13.0 onwards.
  3. Also the KLDivergence in octis.evaluation_metrics.diversity_metrics returns RuntimeWarning: invalid value encountered in log divergence = np.sum(P*np.log(P[/Q](http://localhost:8888/Q)))

What I Did

I made a conda virtual environment with python3.10 and downgraded scipy==1.12 : so prob 1 and 2 are solved.

For the case of 3 : the model_output['topic-word-matrix] for ProdLDA is not suitably normalized in [0,1] to be interpreted as probabilities which gives negative entries in the matrix leading to nan in np.log().

nicepool6 commented 3 months ago

You can try the new topic modeling toolkit TopMost.