NLeSC / litstudy

LitStudy: Using the power of Python to automate scientific literature analysis from the comfort of a Jupyter notebook
https://nlesc.github.io/litstudy/
Apache License 2.0
155 stars 48 forks source link

train_lda_model() fails to access gensim #90

Open LeonardWilleke opened 5 months ago

LeonardWilleke commented 5 months ago

Hi, first of all thanks for developing this clean and handy tool.

When I call nlp.train_lda_model() I get the following error:

ModuleNotFoundError: No module named 'gensim.models.lda'

This makes sense, because I am using gensim 4.2 which doesn't have this module but instead a module called gensim.models.ldamodel. As I understand, litstudy.nlp should detect which version of gensim I am running here:

https://github.com/NLeSC/litstudy/blob/fcb82e860c6c5bc6bf4573b2207e194ba6f6f0b1/litstudy/nlp.py#L323C5-L336C67

Unfortunately, this doesnt seem to work.

When using gensim 3.x, I get errors from different functions. I tried gensim 3.0 - 3.5 and 4.0 - 4.2.

Best regards, Leonard

LeonardWilleke commented 5 months ago
from importlib.metadata import version

gensim_mayor = int(version("gensim").split(".")[0])

So far I used Python==3.7 as stated in the README, but the module importlib.metadata seems to be only available for Python >= 3.8: https://docs.python.org/3/library/importlib.metadata.html#module-importlib.metadata

However with Python 3.8 some other packages fail, probably need to reinstall them.

LeonardWilleke commented 5 months ago

The original error returned after re-installing the other packages. I found that importlib.metadata and gensim work as intended. This simple script

import gensim
from importlib.metadata import version

gensim_mayor = int(version("gensim").split(".")[0])
print(gensim_mayor)

returns 4 as expected. I'm puzzled.

Tim0th1 commented 4 months ago

Litstudy is great. Also agree with this bug as reported.

Temporarily, the NMF topic model is a working substitute and does not require the gensim_mayor version control.