BatsResearch / safranchik-aaai20-code

15 stars 1 forks source link

KeyError: 'PUNCTSIDE_FIN' (SpaCy-Related) #8

Closed yongzx closed 3 years ago

yongzx commented 3 years ago

I encounter the following error when I run train_generative_models.py in the NCBI-Disease/ folder on Google Colab.

Traceback (most recent call last):
  File "train_generative_models.py", line 14, in <module>
    reader = NCBIDiseaseDatasetReader()
  File "/usr/local/lib/python3.6/dist-packages/wiser/data/dataset_readers/ncbi.py", line 27, in __init__
    self.nlp = spacy.load('en_core_web_sm')
  File "/usr/local/lib/python3.6/dist-packages/spacy/__init__.py", line 27, in load
    return util.load_model(name, **overrides)
  File "/usr/local/lib/python3.6/dist-packages/spacy/util.py", line 134, in load_model
    return load_model_from_package(name, **overrides)
  File "/usr/local/lib/python3.6/dist-packages/spacy/util.py", line 155, in load_model_from_package
    return cls.load(**overrides)
  File "/usr/local/lib/python3.6/dist-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/lib/python3.6/dist-packages/spacy/util.py", line 196, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/lib/python3.6/dist-packages/spacy/util.py", line 179, in load_model_from_path
    return nlp.from_disk(model_path)
  File "/usr/local/lib/python3.6/dist-packages/spacy/language.py", line 836, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/lib/python3.6/dist-packages/spacy/util.py", line 636, in from_disk
    reader(path / key)
  File "/usr/local/lib/python3.6/dist-packages/spacy/language.py", line 831, in <lambda>
    p, exclude=["vocab"]
  File "pipes.pyx", line 641, in spacy.pipeline.pipes.Tagger.from_disk
  File "/usr/local/lib/python3.6/dist-packages/spacy/util.py", line 636, in from_disk
    reader(path / key)
  File "pipes.pyx", line 629, in spacy.pipeline.pipes.Tagger.from_disk.load_tag_map
  File "morphology.pyx", line 56, in spacy.morphology.Morphology.__init__
  File "attrs.pyx", line 148, in spacy.attrs.intify_attrs
KeyError: 'PUNCTSIDE_FIN'
Packages installed (`!python3 -m pip freeze` on Google Colab):

``` absl-py==0.10.0 alabaster==0.7.12 albumentations==0.1.12 allennlp==0.8.4 altair==4.1.0 appdirs==1.4.4 argon2-cffi==20.1.0 asgiref==3.3.1 astor==0.8.1 astropy==4.1 astunparse==1.6.3 async-generator==1.10 atari-py==0.2.6 atomicwrites==1.4.0 attrs==20.3.0 audioread==2.1.9 autograd==1.3 awscli==1.19.5 Babel==2.9.0 backcall==0.2.0 beautifulsoup4==4.6.3 bleach==3.3.0 blis==0.2.4 bokeh==2.1.1 boto3==1.17.5 botocore==1.20.5 Bottleneck==1.3.2 branca==0.4.2 bs4==0.0.1 CacheControl==0.12.6 cachetools==4.2.1 catalogue==1.0.0 certifi==2020.12.5 cffi==1.14.4 chainer==7.4.0 chardet==3.0.4 click==7.1.2 cloudpickle==1.3.0 cmake==3.12.0 cmdstanpy==0.9.5 colorama==0.4.3 colorlover==0.3.0 community==1.0.0b1 conllu==0.11 contextlib2==0.5.5 convertdate==2.3.0 coverage==3.7.1 coveralls==0.5 crcmod==1.7 cufflinks==0.17.3 cvxopt==1.2.5 cvxpy==1.0.31 cycler==0.10.0 cymem==2.0.5 Cython==0.29.21 daft==0.0.4 dask==2.12.0 dataclasses==0.8 datascience==0.10.6 debugpy==1.0.0 decorator==4.4.2 defusedxml==0.6.0 descartes==1.1.0 dill==0.3.3 distributed==1.25.3 Django==3.1.6 dlib==19.18.0 dm-tree==0.1.5 docopt==0.6.2 docutils==0.15.2 dopamine-rl==1.0.5 earthengine-api==0.1.238 easydict==1.9 ecos==2.0.7.post1 editdistance==0.5.3 en-core-web-sm==2.2.5 entrypoints==0.3 ephem==3.7.7.1 et-xmlfile==1.0.1 fa2==0.3.5 fancyimpute==0.4.3 fastai==1.0.61 fastdtw==0.3.4 fastprogress==1.0.0 fastrlock==0.5 fbprophet==0.7.1 feather-format==0.4.1 filelock==3.0.12 firebase-admin==4.4.0 fix-yahoo-finance==0.0.22 flaky==3.7.0 Flask==1.1.2 Flask-Cors==3.0.10 flatbuffers==1.12 folium==0.8.3 ftfy==5.8 future==0.16.0 gast==0.3.3 GDAL==2.2.2 gdown==3.6.4 gensim==3.6.0 geographiclib==1.50 geopy==1.17.0 gevent==21.1.2 gin-config==0.4.0 glob2==0.7 google==2.0.3 google-api-core==1.16.0 google-api-python-client==1.7.12 google-auth==1.25.0 google-auth-httplib2==0.0.4 google-auth-oauthlib==0.4.2 google-cloud-bigquery==1.21.0 google-cloud-bigquery-storage==1.1.0 google-cloud-core==1.0.3 google-cloud-datastore==1.8.0 google-cloud-firestore==1.7.0 google-cloud-language==1.2.0 google-cloud-storage==1.18.1 google-cloud-translate==1.5.0 google-colab==1.0.0 google-pasta==0.2.0 google-resumable-media==0.4.1 googleapis-common-protos==1.52.0 googledrivedownloader==0.4 graphviz==0.10.1 greenlet==1.0.0 grpcio==1.32.0 gspread==3.0.1 gspread-dataframe==3.0.8 gym==0.17.3 h5py==2.10.0 HeapDict==1.0.1 hijri-converter==2.1.1 holidays==0.10.5.2 holoviews==1.13.5 html5lib==1.0.1 httpimport==0.5.18 httplib2==0.17.4 httplib2shim==0.0.3 humanize==0.5.1 hyperopt==0.1.2 ideep4py==2.0.0.post3 idna==2.10 image==1.5.33 imageio==2.4.1 imagesize==1.2.0 imbalanced-learn==0.4.3 imblearn==0.0 imgaug==0.2.9 importlib-metadata==3.4.0 importlib-resources==5.1.0 imutils==0.5.4 inflect==2.1.0 iniconfig==1.1.1 intel-openmp==2021.1.2 intervaltree==2.1.0 ipykernel==5.1.3 ipython==7.11.1 ipython-genutils==0.2.0 ipython-sql==0.3.9 ipywidgets==7.5.1 itsdangerous==1.1.0 jax==0.2.9 jaxlib==0.1.60+cuda101 jdcal==1.4.1 jedi==0.18.0 jieba==0.42.1 Jinja2==2.11.3 jmespath==0.10.0 joblib==1.0.0 jpeg4py==0.1.4 jsonnet==0.17.0 jsonpickle==2.0.0 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.3.5 jupyter-console==5.2.0 jupyter-core==4.7.1 jupyterlab-pygments==0.1.2 jupyterlab-widgets==1.0.0 kaggle==1.5.10 kapre==0.1.3.1 Keras==2.4.3 Keras-Preprocessing==1.1.2 keras-vis==0.4.1 kiwisolver==1.3.1 knnimpute==0.1.0 korean-lunar-calendar==0.2.1 labelmodels==0.0.1 librosa==0.8.0 lightgbm==2.2.3 llvmlite==0.34.0 lmdb==0.99 lucid==0.3.8 LunarCalendar==0.0.9 lxml==4.2.6 Markdown==3.3.3 MarkupSafe==1.1.1 matplotlib==3.2.2 matplotlib-venn==0.11.6 missingno==0.4.2 mistune==0.8.4 mizani==0.6.0 mkl==2019.0 mlxtend==0.14.0 more-itertools==8.7.0 moviepy==0.2.3.5 mpmath==1.1.0 msgpack==1.0.2 multiprocess==0.70.11.1 multitasking==0.0.9 murmurhash==1.0.5 music21==5.5.0 natsort==5.5.0 nbclient==0.5.1 nbconvert==5.6.1 nbformat==5.1.2 nest-asyncio==1.5.1 networkx==2.5 nibabel==3.0.2 nltk==3.2.5 notebook==5.3.1 np-utils==0.5.12.1 numba==0.51.2 numexpr==2.7.2 numpy==1.19.5 numpydoc==1.1.0 nvidia-ml-py3==7.352.0 oauth2client==4.1.3 oauthlib==3.1.0 okgrade==0.4.3 opencv-contrib-python==4.1.2.30 opencv-python==4.1.2.30 openpyxl==2.5.9 opt-einsum==3.3.0 osqp==0.6.2.post0 overrides==3.1.0 packaging==20.9 palettable==3.3.0 pandas==0.25.3 pandas-datareader==0.9.0 pandas-gbq==0.13.3 pandas-profiling==1.4.1 pandocfilters==1.4.3 panel==0.9.7 param==1.10.1 parsimonious==0.8.1 parso==0.8.1 pathlib==1.0.1 patsy==0.5.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==7.0.0 pip-tools==4.5.1 plac==0.9.6 plotly==4.4.1 plotnine==0.6.0 pluggy==0.7.1 pooch==1.3.0 portpicker==1.3.1 prefetch-generator==1.0.1 preshed==2.0.1 prettytable==2.0.0 progressbar2==3.38.0 prometheus-client==0.9.0 promise==2.3 prompt-toolkit==3.0.15 protobuf==3.12.4 psutil==5.4.8 psycopg2==2.7.6.1 ptyprocess==0.7.0 py==1.10.0 pyarrow==0.14.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycocotools==2.0.2 pycparser==2.20 pyct==0.4.8 pydata-google-auth==1.1.0 pydot==1.3.0 pydot-ng==2.0.0 pydotplus==2.0.2 PyDrive==1.3.1 pyemd==0.5.1 pyglet==1.5.0 Pygments==2.6.1 pygobject==3.26.1 pymc3==3.7 PyMeeus==0.3.7 pymongo==3.11.3 pymystem3==0.2.0 pynndescent==0.5.1 PyOpenGL==3.1.5 pyparsing==2.4.7 pyrsistent==0.17.3 pysndfile==1.3.8 PySocks==1.7.1 pystan==2.19.1.1 pytest==3.6.4 python-apt==1.6.5+ubuntu0.5 python-chess==0.23.11 python-dateutil==2.8.1 python-louvain==0.15 python-slugify==4.0.1 python-utils==2.5.6 pytorch-pretrained-bert==0.6.2 pytz==2018.9 pyviz-comms==2.0.1 PyWavelets==1.1.1 PyYAML==3.13 pyzmq==22.0.2 qdldl==0.1.5.post0 qtconsole==5.0.2 QtPy==1.9.0 regex==2019.12.20 requests==2.23.0 requests-oauthlib==1.3.0 resampy==0.2.2 responses==0.12.1 retrying==1.3.3 rpy2==3.2.7 rsa==4.5 s3transfer==0.3.4 scikit-image==0.16.2 scikit-learn==0.22.2.post1 scipy==1.4.1 screen-resolution-extra==0.0.0 scs==2.1.2 seaborn==0.11.1 Send2Trash==1.5.0 setuptools-git==1.2 Shapely==1.7.1 simplegeneric==0.8.1 six==1.15.0 sklearn==0.0 sklearn-pandas==1.8.0 smart-open==4.1.2 snowballstemmer==2.1.0 sortedcontainers==2.3.0 SoundFile==0.10.3.post1 spacy==2.1.9 Sphinx==1.8.5 sphinxcontrib-serializinghtml==1.1.4 sphinxcontrib-websupport==1.2.4 SQLAlchemy==1.3.23 sqlparse==0.4.1 srsly==1.0.5 statsmodels==0.10.2 sympy==1.1.1 tables==3.4.4 tabulate==0.8.7 tblib==1.7.0 tensorboard==2.4.1 tensorboard-plugin-wit==1.8.0 tensorboardX==2.1 tensorflow==2.4.1 tensorflow-datasets==4.0.1 tensorflow-estimator==2.4.0 tensorflow-gcs-config==2.4.0 tensorflow-hub==0.11.0 tensorflow-metadata==0.27.0 tensorflow-probability==0.12.1 termcolor==1.1.0 terminado==0.9.2 testpath==0.4.4 text-unidecode==1.3 textblob==0.15.3 textgenrnn==1.4.1 Theano==1.0.5 thinc==7.0.8 tifffile==2020.9.3 toml==0.10.2 toolz==0.11.1 torch==1.7.0+cu101 torchsummary==1.5.1 torchtext==0.3.1 torchvision==0.8.1+cu101 tornado==5.1.1 tqdm==4.41.1 traitlets==4.3.3 tweepy==3.6.0 typeguard==2.7.1 typing-extensions==3.7.4.3 tzlocal==1.5.1 umap-learn==0.5.0 Unidecode==1.2.0 uritemplate==3.0.1 urllib3==1.25.11 vega-datasets==0.9.0 wasabi==0.8.2 wcwidth==0.2.5 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension==3.5.1 wiser==0.0.1 word2number==1.1 wordcloud==1.5.0 wrapt==1.12.1 xarray==0.15.1 xgboost==0.90 xkit==0.0.0 xlrd==1.1.0 xlwt==1.3.0 yellowbrick==0.9.1 zict==2.0.0 zipp==3.4.0 zope.event==4.5.0 zope.interface==5.2.0 ```

safranchik commented 3 years ago

There seems to be a problem with SpaCy's en_core_web_sm dictionary. Can you please try downloading the dictionary in your Colab repo as follows?

$ python3 -m spacy download en_core_web_sm

import spacy spacy.load("en_core_web_sm")

yongzx commented 3 years ago

Alright! I will do it and report how it goes.