dalgu90 / icd-coding-benchmark

Automatic ICD coding benchmark based on the MIMIC dataset
MIT License
35 stars 5 forks source link

Getting an issue with importing gensim #20

Closed dalgu90 closed 2 years ago

dalgu90 commented 2 years ago

I'm getting an issue with importing packages with the current version of packages. It seems like there's incompatibility with the current versions of packages specified in the requirements.txt file. This may be an issue only for a few runtimes (currently I've made conda env again with Python 3.9.

juyongk@gautama /usr1/juyongk/workspace/icd-coding-benchmark   base_trainer ⇡1 ✱ 4 ?11
❯❯❯ python run_preprocessing.py
Traceback (most recent call last):
  File "/usr1/juyongk/workspace/icd-coding-benchmark/run_preprocessing.py", line 1, in <module>
    from src.modules.preprocessing_pipelines import *
  File "/usr1/juyongk/workspace/icd-coding-benchmark/src/modules/preprocessing_pipelines.py", line 7, in <module>
    from src.modules.embeddings import *
  File "/usr1/juyongk/workspace/icd-coding-benchmark/src/modules/embeddings.py", line 5, in <module>
    import gensim
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/corpora/__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
    from gensim import interfaces, utils
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/matutils.py", line 1024, in <module>
    from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
  File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

juyongk@gautama /usr1/juyongk/workspace/icd-coding-benchmark   base_trainer ⇡1 ✱ 4 ?11
❯❯❯ python
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gensim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/corpora/__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
    from gensim import interfaces, utils
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/matutils.py", line 1024, in <module>
    from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
  File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
>>>

The problem does not happen with the latest version of numpy, so can we change the versions in the requirements.txt?

juyongk@gautama /usr1/juyongk/workspace/icd-coding-benchmark   base_trainer ⇡1 ✱ 4 ?11                                                                                                                                                              ✔   icd
❯❯❯ python
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.22.2'
>>> import gensim
/home/juyongk/.miniconda3/envs/icd/lib/python3.9/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)
>>>
dalgu90 commented 2 years ago

Also, we have a missing package tensorboard, which is needed by the base_trainer class.
(This is not directly relevant, but it's still about package)

SuhasShanbhogue commented 2 years ago

Also tensorboard needs to be >= 1.15 for logging.