allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.73k stars 2.24k forks source link

Error occurs when importing checklist stuff with the latest `nltk` #5521

Closed himkt closed 2 years ago

himkt commented 2 years ago

import allennlp.commands raises an error that fails to load omw-1.4. After executing python -m nltk.downloader omw-1.4, the problem does not happen anymore.

Checklist

Description

Python traceback:

``` /tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/tango/__init__.py:17: UserWarning: AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version. warnings.warn( Traceback (most recent call last): File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 84, in __load root = nltk.data.find(f"{self.subdir}/{zip_name}") File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource omw-1.4 not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('omw-1.4') For more information see: https://www.nltk.org/data.html Attempted to load corpora/omw-1.4.zip/omw-1.4/ Searched in: - '/home/himkt/nltk_data' - '/tmp/test-allennlp/venv/nltk_data' - '/tmp/test-allennlp/venv/share/nltk_data' - '/tmp/test-allennlp/venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/commands/__init__.py", line 24, in from allennlp.commands.checklist import CheckList File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/commands/checklist.py", line 18, in from allennlp.confidence_checks.task_checklists.task_suite import TaskSuite File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/confidence_checks/task_checklists/__init__.py", line 1, in from allennlp.confidence_checks.task_checklists.task_suite import TaskSuite File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/confidence_checks/task_checklists/task_suite.py", line 9, in from checklist.perturb import Perturb File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/checklist/perturb.py", line 7, in from pattern.en import tenses File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/pattern/text/en/__init__.py", line 61, in from pattern.text.en.inflect import ( File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/pattern/text/en/__init__.py", line 80, in from pattern.text.en import wordnet File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/pattern/text/en/wordnet/__init__.py", line 74, in VERSION = wn.get_version() or "3.0" File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in __getattr__ self.__load() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 89, in __load corpus = self.__reader_cls(root, *self.__args, **self.__kwargs) File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/reader/wordnet.py", line 1176, in __init__ self.provenances = self.omw_prov() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/reader/wordnet.py", line 1285, in omw_prov fileids = self._omw_reader.fileids() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in __getattr__ self.__load() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 86, in __load raise e File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 81, in __load root = nltk.data.find(f"{self.subdir}/{self.__name}") File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource omw-1.4 not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('omw-1.4') For more information see: https://www.nltk.org/data.html Attempted to load corpora/omw-1.4 Searched in: - '/home/himkt/nltk_data' - '/tmp/test-allennlp/venv/nltk_data' - '/tmp/test-allennlp/venv/share/nltk_data' - '/tmp/test-allennlp/venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** ```

Related issues or possible duplicates

Environment

OS:

> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

Python version: 3.9.9

Output of pip freeze:

``` aiohttp==3.8.1 aiosignal==1.2.0 allennlp==2.8.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 async-timeout==4.0.2 attrs==21.2.0 backcall==0.2.0 backports.csv==1.0.7 base58==2.1.1 beautifulsoup4==4.10.0 bleach==4.1.0 blis==0.7.5 boto3==1.20.25 botocore==1.23.25 cached-path==0.3.2 cachetools==4.2.4 catalogue==2.0.6 certifi==2021.10.8 cffi==1.15.0 chardet==4.0.0 charset-normalizer==2.0.9 checklist==0.0.11 cheroot==8.5.2 CherryPy==18.6.1 click==8.0.3 configparser==5.2.0 cryptography==36.0.1 cymem==2.0.6 datasets==1.16.1 debugpy==1.5.1 decorator==5.1.0 defusedxml==0.7.1 dill==0.3.4 docker-pycreds==0.4.0 entrypoints==0.3 fairscale==0.4.0 feedparser==6.0.8 filelock==3.3.2 frozenlist==1.2.0 fsspec==2021.11.1 future==0.18.2 gitdb==4.0.9 GitPython==3.1.24 google-api-core==2.3.2 google-auth==2.3.3 google-cloud-core==2.2.1 google-cloud-storage==1.43.0 google-crc32c==1.3.0 google-resumable-media==2.1.0 googleapis-common-protos==1.54.0 h5py==3.6.0 huggingface-hub==0.1.2 idna==3.3 iniconfig==1.1.1 ipykernel==6.6.0 ipython==7.30.1 ipython-genutils==0.2.0 ipywidgets==7.6.5 iso-639==0.4.5 jaraco.classes==3.2.1 jaraco.collections==3.4.0 jaraco.functools==3.5.0 jaraco.text==3.6.0 jedi==0.18.1 Jinja2==3.0.3 jmespath==0.10.0 joblib==1.1.0 jsonnet==0.17.0 jsonschema==4.3.2 jupyter==1.0.0 jupyter-client==7.1.0 jupyter-console==6.4.0 jupyter-core==4.9.1 jupyterlab-pygments==0.1.2 jupyterlab-widgets==1.0.2 lmdb==1.2.1 lxml==4.7.1 MarkupSafe==2.0.1 matplotlib-inline==0.1.3 mistune==0.8.4 more-itertools==8.12.0 multidict==5.2.0 multiprocess==0.70.12.2 munch==2.5.0 murmurhash==1.0.6 nbclient==0.5.9 nbconvert==6.3.0 nbformat==5.1.3 nest-asyncio==1.5.4 nltk==3.6.6 notebook==6.4.6 numpy==1.21.5 overrides==3.1.0 packaging==21.3 pandas==1.3.5 pandocfilters==1.5.0 parso==0.8.3 pathtools==0.1.2 pathy==0.6.1 patternfork-nosql==3.6 pdfminer.six==20211012 pexpect==4.8.0 pickleshare==0.7.5 Pillow==8.4.0 pluggy==1.0.0 portend==3.1.0 preshed==3.0.6 prometheus-client==0.12.0 promise==2.3 prompt-toolkit==3.0.24 protobuf==3.19.1 psutil==5.8.0 ptyprocess==0.7.0 py==1.11.0 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydantic==1.8.2 Pygments==2.10.0 pyparsing==3.0.6 pyrsistent==0.18.0 pytest==6.2.5 python-dateutil==2.8.2 python-docx==0.8.11 pytz==2021.3 PyYAML==6.0 pyzmq==22.3.0 qtconsole==5.2.2 QtPy==1.11.3 regex==2021.11.10 requests==2.26.0 rsa==4.8 s3transfer==0.5.0 sacremoses==0.0.46 scikit-learn==1.0.1 scipy==1.7.3 Send2Trash==1.8.0 sentencepiece==0.1.96 sentry-sdk==1.5.1 sgmllib3k==1.0.0 shortuuid==1.0.8 six==1.16.0 smart-open==5.2.1 smmap==5.0.0 soupsieve==2.3.1 spacy==3.1.4 spacy-legacy==3.0.8 sqlitedict==1.7.0 srsly==2.4.2 subprocess32==3.5.4 tempora==4.1.2 tensorboardX==2.4.1 termcolor==1.1.0 terminado==0.12.1 testpath==0.5.0 thinc==8.0.13 threadpoolctl==3.0.0 tokenizers==0.10.3 toml==0.10.2 torch==1.10.1 torchvision==0.11.2 tornado==6.1 tqdm==4.62.3 traitlets==5.1.1 transformers==4.12.5 typer==0.4.0 typing_extensions==4.0.1 urllib3==1.26.7 wandb==0.12.9 wasabi==0.9.0 wcwidth==0.2.5 webencodings==0.5.1 widgetsnbextension==3.5.2 xxhash==2.0.2 yarl==1.7.2 yaspin==2.1.0 zc.lockfile==2.0 ```

Steps to reproduce

Example source:

``` > python3 --version Python 3.9.9 2021-12-21 20:36:49 [/tmp/test-allennlp] > python3 -m venv venv 2021-12-21 20:36:55 [/tmp/test-allennlp] > . venv/bin/activate (venv) 2021-12-21 20:36:56 [/tmp/test-allennlp] > pip install allennlp --quiet (venv) 2021-12-21 20:38:55 [/tmp/test-allennlp] > pip list | grep allennlp allennlp 2.8.0 (venv) 2021-12-21 20:40:47 [/tmp/test-allennlp] > pip list | grep nltk nltk 3.6.6 (venv) 2021-12-21 20:40:50 [/tmp/test-allennlp] > python -c 'import allennlp.commands' /tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/tango/__init__.py:17: UserWarning: AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version. warnings.warn( Traceback (most recent call last): File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 84, in __load root = nltk.data.find(f"{self.subdir}/{zip_name}") File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource omw-1.4 not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('omw-1.4') For more information see: https://www.nltk.org/data.html Attempted to load corpora/omw-1.4.zip/omw-1.4/ Searched in: - '/home/himkt/nltk_data' - '/tmp/test-allennlp/venv/nltk_data' - '/tmp/test-allennlp/venv/share/nltk_data' - '/tmp/test-allennlp/venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/commands/__init__.py", line 24, in from allennlp.commands.checklist import CheckList File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/commands/checklist.py", line 18, in from allennlp.confidence_checks.task_checklists.task_suite import TaskSuite File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/confidence_checks/task_checklists/__init__.py", line 1, in from allennlp.confidence_checks.task_checklists.task_suite import TaskSuite File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/allennlp/confidence_checks/task_checklists/task_suite.py", line 9, in from checklist.perturb import Perturb File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/checklist/perturb.py", line 7, in from pattern.en import tenses File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/pattern/text/en/__init__.py", line 61, in from pattern.text.en.inflect import ( File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/pattern/text/en/__init__.py", line 80, in from pattern.text.en import wordnet File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/pattern/text/en/wordnet/__init__.py", line 74, in VERSION = wn.get_version() or "3.0" File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in __getattr__ self.__load() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 89, in __load corpus = self.__reader_cls(root, *self.__args, **self.__kwargs) File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/reader/wordnet.py", line 1176, in __init__ self.provenances = self.omw_prov() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/reader/wordnet.py", line 1285, in omw_prov fileids = self._omw_reader.fileids() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in __getattr__ self.__load() File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 86, in __load raise e File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 81, in __load root = nltk.data.find(f"{self.subdir}/{self.__name}") File "/tmp/test-allennlp/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource omw-1.4 not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('omw-1.4') For more information see: https://www.nltk.org/data.html Attempted to load corpora/omw-1.4 Searched in: - '/home/himkt/nltk_data' - '/tmp/test-allennlp/venv/nltk_data' - '/tmp/test-allennlp/venv/share/nltk_data' - '/tmp/test-allennlp/venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** ```

himkt commented 2 years ago

I found the error doesn't occurs when running git clone and pip install .. Sorry for raising the fixed problem. It's enough to wait the next release of AllenNLP.

himkt commented 2 years ago

After removing ~/nltk_data, the error occurs again and I re-opened the issue. Please close if you think here is not good place to discuss.

idiomaticrefactoring commented 2 years ago

After removing ~/nltk_data, the error occurs again and I re-opened the issue. Please close if you think here is not good place to discuss.

I also find the problem when I run test suite. python3.7 -m pytest -v tests/core/policies/test_unexpected_intent_policy.py

epwalsh commented 2 years ago

See also https://github.com/allenai/allennlp/issues/5523. https://github.com/allenai/allennlp/pull/5529 should fix our CI and Docker image issues. But I'm not sure what else we can do about this other than downgrading the necessary dependencies.

github-actions[bot] commented 2 years ago

This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇