coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.
Apache License 2.0
30 stars 10 forks source link

More detailed installation instruction needed? working overrides version and comet_ml package #89

Closed weissenh closed 3 years ago

weissenh commented 3 years ago

Background:

I was trying to get the following command to run on my local computer (not the coli-servers):

python3 train.py --help

To do so, I followed the installation requirements (cited from the README):

Requirements
- Python 3.7 up to version 3.7.3
- Python 2.7 for EDS and AMR evaluation (EDM metric and Smatch)
- AllenNLP (tested with version 0.8.4 and Pytorch 1.1)
- Cython
- [dependency_decoding](https://github.com/andersjo/dependency_decoding)
- The spacy core web md model: `python -m spacy download en_core_web_md`
- You may require to set your version of sklearn (an AllenNLP requirement, usually automatically installed) manually to version 0.22 or lower, e.g. with `pip install scikit-learn==0.22.2`
- a build of [am-tools](https://github.com/coli-saar/am-tools); will be downloaded automatically.

(We recommend to set up a conda environment.)

__Internal note:__ this is already set up on the Saarland servers, see details [here](https://github.com/coli-saar/am-parser/wiki/Setup-and-file-locations-on-the-Saarland-servers).

Here are installation commands I used:

conda create -n amparser python==3.7.3
conda activate amparser
pip install allennlp==0.8.4
pip install cython
pip install git+https://github.com/andersjo/dependency_decoding
python -m spacy download en_core_web_md
pip install scikit-learn==0.22.2

I also added an am-tools.jar file (I wasn't using the command line for that but for the errors I got I don't think it's relevant).

Note: I wasn't working on the main branch for that, but the command I was trying to get to run python3 train.py --help, at least the train.py is the same file (checked with diff) with the one on the main branch. Also, this isn't actually training anything, I just want to see the help message. It would be cool to replicate this on the main branch, however (for instance the pyjnius import seems to be absent in the main branch important_imports, but happens to exist on my current branch)l The branch I was on is cogs_unsupervised which I just recently started and which in turn is based on unsupervised2020.

I know that there is an conda environment on the coli-servers that I have access to, but for trying things on my local computer and also for other researchers not at UdS who would like to reproduce our experiments, I think this is relevant.

Problem:

After the installation as described above, I tried to display the usage info for the train.py file:

python3 train.py --help

Turns out I have to install comet_ml too:

$ python3 train.py --help
Traceback (most recent call last):
  File "train.py", line 62, in <module>
    from comet_ml import Experiment
ModuleNotFoundError: No module named 'comet_ml'
$ pip install comet_ml

:bulb: Insight number 1: add comet_ml to the requirements maybe?

Then I tried again, but it's failing again although with a different error:

$ python3 train.py --help
Traceback (most recent call last):
  File "train.py", line 66, in <module>
    from allennlp.commands.subcommand import Subcommand
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 8, in <module>
    from allennlp.commands.configure import Configure
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/commands/configure.py", line 27, in <module>
    from allennlp.service.config_explorer import make_app
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/service/config_explorer.py", line 24, in <module>
    from allennlp.common.configuration import configure, choices
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/common/configuration.py", line 17, in <module>
    from allennlp.data.dataset_readers import DatasetReader
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/__init__.py", line 1, in <module>
    from allennlp.data.dataset_readers.dataset_reader import DatasetReader
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module>
    from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module>
    from allennlp.data.dataset_readers.dataset_reader import DatasetReader
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module>
    from allennlp.data.instance import Instance
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/instance.py", line 3, in <module>
    from allennlp.data.fields.field import DataArray, Field
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/fields/__init__.py", line 7, in <module>
    from allennlp.data.fields.array_field import ArrayField
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/fields/array_field.py", line 10, in <module>
    class ArrayField(Field[numpy.ndarray]):
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/allennlp/data/fields/array_field.py", line 50, in ArrayField
    @overrides
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/overrides/overrides.py", line 88, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/overrides/overrides.py", line 114, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/overrides/overrides.py", line 135, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/overrides/signature.py", line 93, in ensure_signature_is_compatible
    ensure_return_type_compatibility(super_type_hints, sub_type_hints, method_name)
  File "/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/overrides/signature.py", line 288, in ensure_return_type_compatibility
    f"{method_name}: return type `{sub_return}` is not a `{super_return}`."
TypeError: ArrayField.empty_field: return type `None` is not a `<class 'allennlp.data.fields.field.Field'>`.

According to @namednil 's suggestion (thanks a lot!), I installed a different version of overrides (he said that's the one we use on the coli-servers):

pip install overrides==1.9

:bulb: Insight number 2: maybe specify which overrides versions are fine?

Afterwards I tried again to display the help message of train.py. Yet another error message - different from the previous ones- pops up (note: probably not relevant for work on the main branch):

$ python3 train.py --help
2021-05-21 14:40:08,991 - INFO - pytorch_pretrained_bert.modeling - Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
  FutureWarning)
Either spacy pytorch transformers or cupy not available, so you cannot use spacy-tok2vec! This is only an issue, if you intend to use roberta or xlnet.
Traceback (most recent call last):
  File "train.py", line 80, in <module>
    import graph_dependency_parser.important_imports
  File "/home/wurzel/HiwiAK/am-parser/graph_dependency_parser/important_imports.py", line 20, in <module>
    import jnius_config
ModuleNotFoundError: No module named 'jnius_config'

So the solution to that error is to install pyjnius.

pip install pyjnius

:bulb: Insight number 3: For the branches for which it is relevant (unsupervised2020, cogs_unsupervised, ...?), add pyjnius to the requirements maybe?

Afterwards I again called train.py with the help-flag and finally got the output I would expect :partying_face: (ok, minus the first 3-4 lines of warnings/info messages maybe):

$ python3 train.py --help
2021-05-21 14:41:01,475 - INFO - pytorch_pretrained_bert.modeling - Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
/home/wurzel/anaconda3/envs/amparser/lib/python3.7/site-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
  FutureWarning)
Either spacy pytorch transformers or cupy not available, so you cannot use spacy-tok2vec! This is only an issue, if you intend to use roberta or xlnet.
usage: train.py [-h] -s SERIALIZATION_DIR [-r] [--comet COMET]
                [--workspace WORKSPACE] [--project PROJECT]
                [--tags TAGS [TAGS ...]] [-f] [-o OVERRIDES]
                [--file-friendly-logging]
                param_path

Run the training of an am-parser.

positional arguments:
  param_path            path to parameter file describing the model to be
                        trained

optional arguments:
  -h, --help            show this help message and exit
  -s SERIALIZATION_DIR, --serialization-dir SERIALIZATION_DIR
                        directory in which to save the model and its logs
  -r, --recover         recover training from the state in serialization_dir
  --comet COMET         comet.ml api key, if you want to log with comet.ml
  --workspace WORKSPACE
                        name of comet.ml workspace
  --project PROJECT     name of comet.ml project
  --tags TAGS [TAGS ...]
                        Tags used for comet.ml. Usage: "--tags foo bar" will
                        add two tags
  -f, --force           overwrite the output directory if it exists
  -o OVERRIDES, --overrides OVERRIDES
                        a JSON structure used to override the experiment
                        configuration
  --file-friendly-logging
                        outputs tqdm status on separate lines and slows tqdm
                        refresh rate

Related issues

Other issues also contributed to the topic of what needs to be installed:

Slightly related: #85 part of the documentation of each experiment could be detailed installation instructions (or maybe just a requirements.txt file).

Proposed solution

I would advocate to be more precise about installation instructions/requirements (coli-servers are not always the solution). Maybe we can offer some requirements.txt file or environment.yml file, a sequence of commands or something else? This could be even marked as just a suggestion like "we can't always check which versions of all the packages work together nicely, but we found this set of package versions to work".

So I have only tried to run python3 train.py --help which doesn't involve any serious computations, so I don't know whether more problems will occur for other commands (like actually training the parser).

Note again that I wasn't working on the main branch, so I suggest to first try to replicate this on the main branch. As I said the train.py file on my branch was line by line the same as the one on the main branch, but this is not everything. (I'm pretty confident that it's reproducible on the main branch, minus the pyjnius thing, but who knows?).

I'm assigning @AinaIanemahy to this as a question to her whether you want to look into this and maybe update the README or the wiki?

For reference, output of pip freeze after I finally got python3 train.py --help to work without error messages.

$ pip freeze
alabaster==0.7.12
allennlp==0.8.4
attrs==21.2.0
awscli==1.19.77
Babel==2.9.1
blis==0.2.4
boto3==1.17.77
botocore==1.20.77
cached-property==1.5.2
certifi==2020.12.5
chardet==4.0.0
click==8.0.1
colorama==0.4.3
comet-ml==3.10.0
configobj==5.0.6
conllu==0.11
cycler==0.10.0
cymem==2.0.5
Cython==0.29.23
dependency-decoding @ git+https://github.com/andersjo/dependency_decoding@79510908223b93bd4c1fb0409a2a66dd75577c2c
docutils==0.15.2
dulwich==0.20.21
editdistance==0.5.3
en-core-web-md @ https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.1.0/en_core_web_md-2.1.0.tar.gz
everett==1.0.3
flaky==3.7.0
Flask==2.0.0
Flask-Cors==3.0.10
ftfy==6.0.1
gevent==21.1.2
greenlet==1.1.0
h5py==3.2.1
idna==2.10
imagesize==1.2.0
importlib-metadata==4.0.1
iniconfig==1.1.1
itsdangerous==2.0.1
Jinja2==3.0.1
jmespath==0.10.0
joblib==1.0.1
jsonnet==0.17.0
jsonpickle==2.0.0
jsonschema==3.2.0
kiwisolver==1.3.1
MarkupSafe==2.0.1
matplotlib==3.4.2
murmurhash==1.0.5
nltk==3.6.2
numpy==1.20.3
numpydoc==1.1.0
nvidia-ml-py3==7.352.0
overrides==1.9
packaging==20.9
parsimonious==0.8.1
Pillow==8.2.0
plac==0.9.6
pluggy==0.13.1
preshed==2.0.1
protobuf==3.17.0
py==1.10.0
pyasn1==0.4.8
Pygments==2.9.0
pyjnius==1.3.0
pyparsing==2.4.7
pyrsistent==0.17.3
pytest==6.2.4
python-dateutil==2.8.1
pytorch-pretrained-bert==0.6.2
pytz==2021.1
PyYAML==5.4.1
regex==2021.4.4
requests==2.25.1
requests-toolbelt==0.9.1
responses==0.13.3
rsa==4.7.2
s3transfer==0.4.2
scikit-learn==0.22.2
scipy==1.6.3
six==1.16.0
snowballstemmer==2.1.0
spacy==2.1.9
Sphinx==4.0.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
sqlparse==0.4.1
srsly==1.0.5
tensorboardX==2.2
thinc==7.0.8
threadpoolctl==2.1.0
toml==0.10.2
torch==1.8.1
tqdm==4.60.0
typing-extensions==3.10.0.0
typing-utils==0.0.3
Unidecode==1.2.0
urllib3==1.26.4
wasabi==0.8.2
wcwidth==0.2.5
websocket-client==1.0.0
Werkzeug==2.0.1
word2number==1.1
wrapt==1.12.1
wurlitzer==2.1.0
zipp==3.4.1
zope.event==4.5.0
zope.interface==5.4.0
weissenh commented 3 years ago

Note: got a TypeError: unhashable type: 'de.up.ling.irtg.automata.Rule' with pyjnius==1.3.0 (the version downloaded when I called pip install pyjnius). After replacing it with an older version (pip install pyjnius==1.2.1, thanks to @jgroschwitz for this suggestion ) this error disappeared. Just like with overrides package, we might want to mention which version we used. Pyjnius is probably not relevant for the current main/master branch, but for the unsupervised one and the cogs-unsupervised one.

AinaIanemahy commented 3 years ago

@weissenh Did you not run into problems with Python2 / how did you solve that? Or did you simply not use the part of the Code where it is needed?

weissenh commented 3 years ago

@AinaIanemahy I didn't use python2 here. It's not needed to get train.py --help to work, right? And I don't think I'm going to need python2 in the near future (I'm not doing anything with AMR or EDS right now). I know that python2 is needed in the predict.sh, but I guess you actually only need to use it there if you're dealing with EDS (and mb AMR, idk).

AinaIanemahy commented 3 years ago

I added the information on the packages to the wiki and the readme.