amir-zeldes / HebPipe

An NLP pipeline for Hebrew
Other
34 stars 9 forks source link

No module named 'transformers.modeling_bert' following fresh pip install #19

Closed cjer closed 2 years ago

cjer commented 2 years ago

I installed hebpipe using pip install hebpipe in a clean env (python=3.8.13). Then ran: python -m hebpipe example_in.txt Models are downloaded, but then I get a ModuleNotFoundError:

$ python -m hebpipe example_in.txt
! You selected no processing options
! Assuming you want all processing steps

Running tasks:
====================
o Automatic sentence splitting (neural)
o Whitespace tokenization
o Morphological segmentation
o POS tagging
o Lemmatization
o Morphological analysis
o Dependency parsing
o Entity recognition
o Coreference resolution

! Model file heb.sm3 missing in ./models/
! Model file heb.xrm missing in ./models/
! Model file heb.flair missing in ./models/
! Model file heb.morph missing in ./models/
! Model file heb.sent missing in ./models/
! Model file heb.diaparser missing in ./models/
! Model file stanza/he_lemmatizer.pt missing in ./models/
! Model file stanza/he_htb.pretrain.pt missing in ./models/
! You are missing required software:
 - Tagging, lemmatization and morphological analysis require models
 - Model files in models/ are missing
Attempt to download missing files? [Y/N]
Y
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/heb.sm3
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/heb.diaparser
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/heb.sent
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/heb.xrm
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/heb.flair
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/heb.morph
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/he_htb.pretrain.pt
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v2/he_lemmatizer.pt

Traceback (most recent call last):
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 185,
 in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 144,
 in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 111,
 in _get_module_details
    __import__(pkg_name)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpip
e/__init__.py", line 2, in <module>
    run_hebpipe()
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe/heb_pipe.py", line 867, in run_hebpipe
    tagger = FlairTagger()
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe/lib/flair_pos_tagger.py", line 30, in __init__
    self.model = SequenceTagger.load(model_dir + "heb.flair")
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/flair/nn.py", line 88, in load
    state = torch.load(f, map_location='cpu')
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/torch/serialization.py", line 1046, in _load
    result = unpickler.load()
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/torch/serialization.py", line 1039, in find_class
    return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'transformers.modeling_bert'
Elapsed time: 0:00:00.821
========================================

pip freeze:

attrs==21.4.0
beautifulsoup4==4.10.0
bpemb==0.3.3
certifi==2021.10.8
charset-normalizer==2.0.12
click==8.1.0
cloudpickle==2.0.0
conllu==4.4.1
cycler==0.11.0
depedit==3.2.1.0
Deprecated==1.2.13
diaparser==1.1.2
emoji==1.7.0
filelock==3.6.0
flair==0.6.1
fonttools==4.31.2
ftfy==6.1.1
future==0.18.2
gdown==4.4.0
gensim==4.1.2
hebpipe==2.0.0.1
huggingface-hub==0.4.0
hyperopt==0.2.7
idna==3.3
importlib-metadata==3.10.1
iniconfig==1.1.1
Janome==0.4.2
joblib==1.1.0
kiwisolver==1.4.2
konoha==4.6.5
langdetect==1.0.9
lxml==4.8.0
matplotlib==3.5.1
mpld3==0.3
networkx==2.7.1
nltk==3.7
numpy==1.19.4
overrides==3.1.0
packaging==21.3
pandas==1.4.1
Pillow==9.0.1
pluggy==1.0.0
protobuf==3.19.4
py==1.11.0
py4j==0.10.9.5
pyparsing==3.0.7
PySocks==1.7.1
pytest==7.1.1
python-dateutil==2.8.2
pytz==2022.1
PyYAML==6.0
regex==2022.3.15
requests==2.27.1
rftokenizer==2.0.1
sacremoses==0.0.49
scikit-learn==1.0.2
scipy==1.8.0
segtok==1.5.11
sentencepiece==0.1.96
six==1.16.0
smart-open==5.2.1
soupsieve==2.3.1
sqlitedict==2.0.0
stanza==1.3.0
tabulate==0.8.9
threadpoolctl==3.1.0
tokenizers==0.11.6
tomli==2.0.1
torch==1.11.0
tqdm==4.63.1
transformers==4.17.0
typing_extensions==4.1.1
urllib3==1.26.9
wcwidth==0.2.5
wrapt==1.14.0
xgboost==0.81
xmltodict==0.12.0
zipp==3.7.0

Any idea what might be causing this? transformers is actually installed.

cjer commented 2 years ago

(Looks like the downloaded flair model is a pickle that uses some incompatible version of transformers)

amir-zeldes commented 2 years ago

Hm, thanks for reporting, looks like flair 0.6.1 doesn't actually limit the transformers version it's chained to. Can you try with transformers==3.5.1? If that works then the pickled model might mean that needs to be specified in requirements.

cjer commented 2 years ago

Thanks! This indeed solve this error, but then there was another import error for something torch related:

ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

I downgraded to torch=1.6.0, and it seems to have solved this one.

I think this is it for import issues, but now I get a stanza.pipeline.core.ResourcesFileNotFoundError after downloading the models:

...
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v
2/heb.morph
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v
2/he_htb.pretrain.pt
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v
2/he_lemmatizer.pt

Downloading: 100%|██████████████████████████████| 565/565 [00:00<00:00, 379kB/s]
Downloading: 100%|████████████████████████████| 545k/545k [00:04<00:00, 135kB/s]
Downloading: 100%|█████████████████████████████| 112/112 [00:00<00:00, 79.5kB/s]
Downloading: 100%|██████████████████████████████| 288/288 [00:00<00:00, 214kB/s]
Traceback (most recent call last):
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 185,
 in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 144, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe-2.0.0.1-py3.8.egg/hebpipe/__init__.py", line 2, in <module>
    run_hebpipe()
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe-2.0.0.1-py3.8.egg/hebpipe/heb_pipe.py", line 864, in run_hebpipe
    lemmatizer = init_lemmatizer(cpu=opts.cpu, no_post_process=opts.disable_lex)
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe-2.0.0.1-py3.8.egg/hebpipe/heb_pipe.py", line 71, in init_lemmatizer
    lemmatizer = stanza.Pipeline("he", package="htb", processors="lemma", tokenize_no_ssplit=True,
  File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/stanza/pipeline/core.py", line 90, in __init__
    raise ResourcesFileNotFoundError(resources_filepath)
stanza.pipeline.core.ResourcesFileNotFoundError: Resources file not found at: /home/stanza_resources/resources.json  Try to download the model again.
Elapsed time: 0:00:29.195
========================================
cjer commented 2 years ago

OK so I used stanza to download its Hebrew model and it solved the problem:

import stanza
stanza.download('he')

I'm guessing this should probably be added to the model download phase.

Thanks for the help :)

amir-zeldes commented 2 years ago

Thanks for verifying this, I'll change requirements.txt to reflect the versions. The Stanza trick will work, but note we actually have better stanza models that should have been auto-downloaded from here:

https://corpling.uis.georgetown.edu/amir/download/heb_models_v2/he_lemmatizer.pt https://corpling.uis.georgetown.edu/amir/download/heb_models_v2/he_htb.pretrain.pt

Your output looks like they are getting downloaded, but somehow not found at runtime. I noticed you're using anaconda, which I'm not using, so there is maybe a small chance that is related, otherwise maybe some strange path or environment variable issue. Can you see whether he_lemmatizer.pt actually got downloaded and placed into hebpipe/models/stanza/? If not, then the reason stanza.download('he') is working is that stanza is falling back to the default Hebrew model in site-packages/stanza (this model is not terrible, but substantially worse for lemmatization than the one in the download, see here).

cjer commented 2 years ago

Thanks for the help :)

Yes, both models exist in hebpipe/models/stanza/. The error is raised because the resources.json file doesn't exist. This is a global file stanza uses for mapping between all the languages and their corresponding models. I think that in a fresh env/machine, this file is created only once you run stanza.download(...)` for the first time. I don't think this should have an effect on the pretrained model that is used by hebpipe (since you pass it explicitly to the stanza pipeline init).

btw, there is also a version conflict warning in runtime now:

/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/scikit_learn-1.0.2-py3.8-linux-x86_64.egg/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations

It might be best if you did pip freeze > requirements.txt in an environment that you know has no conflicts, and use that in the package setup.py and requirements.txt files.

tamarjohn commented 2 years ago

Also received the same error and when I pip installed transformers==3.5.1 I got this error message:

Screenshot 2022-03-30 at 12 45 03

Any idea on how I this can be solved? I tried pip install --upgrade sentencepiece==0.1.91 but it didn't work:

Screenshot 2022-03-30 at 12 48 15
cjer commented 2 years ago

This worked for me in setup.py in a clean env:

   install_requires = ['requests','numpy==1.19.4','pandas','scipy','joblib','xgboost==0.81','depedit','xmltodict',
                      'torch==1.6.0','sentencepiece==0.1.91','transformers==3.5.1','flair==0.6.1','diaparser==1.1.2','rftokenizer', 'stanza','conllu'],
tamarjohn commented 2 years ago

Thanks! After setting python 3.8 in the environment (I had python 3.9 set up) I followed your solution and it works.