Closed cjer closed 2 years ago
(Looks like the downloaded flair model is a pickle that uses some incompatible version of transformers)
Hm, thanks for reporting, looks like flair 0.6.1 doesn't actually limit the transformers version it's chained to. Can you try with transformers==3.5.1
? If that works then the pickled model might mean that needs to be specified in requirements.
Thanks! This indeed solve this error, but then there was another import error for something torch
related:
ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'
I downgraded to torch=1.6.0
, and it seems to have solved this one.
I think this is it for import issues, but now I get a stanza.pipeline.core.ResourcesFileNotFoundError
after downloading the models:
...
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v
2/heb.morph
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v
2/he_htb.pretrain.pt
o Downloading from http://corpling.uis.georgetown.edu/amir/download/heb_models_v
2/he_lemmatizer.pt
Downloading: 100%|██████████████████████████████| 565/565 [00:00<00:00, 379kB/s]
Downloading: 100%|████████████████████████████| 545k/545k [00:04<00:00, 135kB/s]
Downloading: 100%|█████████████████████████████| 112/112 [00:00<00:00, 79.5kB/s]
Downloading: 100%|██████████████████████████████| 288/288 [00:00<00:00, 214kB/s]
Traceback (most recent call last):
File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 185,
in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 144, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/home/anaconda3/envs/hebpipe/lib/python3.8/runpy.py", line 111, in _get_module_details
__import__(pkg_name)
File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe-2.0.0.1-py3.8.egg/hebpipe/__init__.py", line 2, in <module>
run_hebpipe()
File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe-2.0.0.1-py3.8.egg/hebpipe/heb_pipe.py", line 864, in run_hebpipe
lemmatizer = init_lemmatizer(cpu=opts.cpu, no_post_process=opts.disable_lex)
File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/hebpipe-2.0.0.1-py3.8.egg/hebpipe/heb_pipe.py", line 71, in init_lemmatizer
lemmatizer = stanza.Pipeline("he", package="htb", processors="lemma", tokenize_no_ssplit=True,
File "/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/stanza/pipeline/core.py", line 90, in __init__
raise ResourcesFileNotFoundError(resources_filepath)
stanza.pipeline.core.ResourcesFileNotFoundError: Resources file not found at: /home/stanza_resources/resources.json Try to download the model again.
Elapsed time: 0:00:29.195
========================================
OK so I used stanza to download its Hebrew model and it solved the problem:
import stanza
stanza.download('he')
I'm guessing this should probably be added to the model download phase.
Thanks for the help :)
Thanks for verifying this, I'll change requirements.txt to reflect the versions. The Stanza trick will work, but note we actually have better stanza models that should have been auto-downloaded from here:
https://corpling.uis.georgetown.edu/amir/download/heb_models_v2/he_lemmatizer.pt https://corpling.uis.georgetown.edu/amir/download/heb_models_v2/he_htb.pretrain.pt
Your output looks like they are getting downloaded, but somehow not found at runtime. I noticed you're using anaconda, which I'm not using, so there is maybe a small chance that is related, otherwise maybe some strange path or environment variable issue. Can you see whether he_lemmatizer.pt
actually got downloaded and placed into hebpipe/models/stanza/
? If not, then the reason stanza.download('he')
is working is that stanza is falling back to the default Hebrew model in site-packages/stanza (this model is not terrible, but substantially worse for lemmatization than the one in the download, see here).
Thanks for the help :)
Yes, both models exist in hebpipe/models/stanza/
. The error is raised because the resources.json
file doesn't exist. This is a global file stanza
uses for mapping between all the languages and their corresponding models. I think that in a fresh env/machine, this file is created only once you run stanza.download(...)` for the first time. I don't think this should have an effect on the pretrained model that is used by hebpipe (since you pass it explicitly to the stanza pipeline init).
btw, there is also a version conflict warning in runtime now:
/home/anaconda3/envs/hebpipe/lib/python3.8/site-packages/scikit_learn-1.0.2-py3.8-linux-x86_64.egg/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
It might be best if you did pip freeze > requirements.txt
in an environment that you know has no conflicts, and use that in the package setup.py
and requirements.txt
files.
Also received the same error and when I pip installed transformers==3.5.1 I got this error message:
Any idea on how I this can be solved? I tried pip install --upgrade sentencepiece==0.1.91 but it didn't work:
This worked for me in setup.py
in a clean env:
install_requires = ['requests','numpy==1.19.4','pandas','scipy','joblib','xgboost==0.81','depedit','xmltodict',
'torch==1.6.0','sentencepiece==0.1.91','transformers==3.5.1','flair==0.6.1','diaparser==1.1.2','rftokenizer', 'stanza','conllu'],
Thanks! After setting python 3.8 in the environment (I had python 3.9 set up) I followed your solution and it works.
I installed hebpipe using
pip install hebpipe
in a clean env (python=3.8.13
). Then ran:python -m hebpipe example_in.txt
Models are downloaded, but then I get aModuleNotFoundError
:pip freeze:
Any idea what might be causing this? transformers is actually installed.