allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

Cannot train NER model that uses bidirectional_lm_token_embedder: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu #4646

Closed tpanza closed 4 years ago

tpanza commented 4 years ago

Checklist

Description

Python traceback:

``` 2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Beginning training. 2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Epoch 0/21 2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 3118.312 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1314 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11 2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11 2020-09-16 17:06:02,732 - INFO - allennlp.training.trainer - Training 0%| | 0/169 [00:00 sys.exit(run()) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run main(prog="allennlp") File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main args.func(args) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 118, in train_model_from_args file_friendly_logging=args.file_friendly_logging, File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file file_friendly_logging=file_friendly_logging, File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 238, in train_model file_friendly_logging=file_friendly_logging, File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 443, in _train_worker metrics = train_loop.run() File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 505, in run return self.trainer.train() File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 867, in train train_metrics = self._train_epoch(epoch) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 589, in _train_epoch batch_outputs = self.batch_outputs(batch, for_training=True) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 479, in batch_outputs output_dict = self._pytorch_model(**batch) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward embedded = self._embedder(tokens) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward token_vectors = embedder(list(tensors.values())[0], **forward_params_values) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 187, in forward tokens, mask, self._bos_indices, self._eos_indices File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/nn/util.py", line 1565, in add_sentence_boundary_token_ids tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2020-09-16 17:06:02,789 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpobwn4xyg ```

Related issues or possible duplicates

Environment

OS: Red Hat Enterprise Linux Server release 7.6 (Maipo), Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Fri Apr 19 21:09:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Python version: Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] on linux

Issue observed with 1.1.0 release of allennlp and allennlp-models, plus the following patches manually applied:

Output of pip freeze:

``` alabaster==0.7.12 alembic==1.4.2 allennlp==1.1.0 allennlp-models==1.1.0 appdirs==1.4.3 argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1596629848788/work astroid @ file:///home/conda/feedstock_root/build_artifacts/astroid_1591645081755/work attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1597959372343/work Babel==2.8.0 backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work backports.functools-lru-cache==1.6.1 beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1597679909012/work black @ file:///home/conda/feedstock_root/build_artifacts/black-recipe_1596181673569/work bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1588608214987/work blis==0.4.1 boto==2.49.0 boto3 @ file:///home/conda/feedstock_root/build_artifacts/boto3_1599083817494/work botocore @ file:///home/conda/feedstock_root/build_artifacts/botocore_1599079795233/work brotlipy==0.7.0 bz2file==0.98 cachetools @ file:///home/conda/feedstock_root/build_artifacts/cachetools_1593420445823/work catalogue==1.0.0 certifi==2020.6.20 cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1595805535531/work chardet==3.0.4 click==7.1.2 cliff @ file:///home/conda/feedstock_root/build_artifacts/cliff_1596473693634/work cmaes @ file:///home/conda/feedstock_root/build_artifacts/cmaes_1596014311252/work cmd2 @ file:///home/conda/feedstock_root/build_artifacts/cmd2_1591809284986/work colorama==0.4.3 colorlog==4.2.1 conllu==4.1 cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1598621005849/work css-html-js-minify @ file:///home/conda/feedstock_root/build_artifacts/css-html-js-minify_1589201803800/work cycler==0.10.0 cymem==2.0.3 Cython @ file:///home/conda/feedstock_root/build_artifacts/cython_1594253470382/work decorator==4.4.2 defusedxml==0.6.0 docutils==0.15.2 en-core-web-sm @ file:///home/conda/feedstock_root/build_artifacts/spacy-model-en_core_web_1595437534769/work/sm entrypoints==0.3 filelock==3.0.12 flake8 @ file:///home/conda/feedstock_root/build_artifacts/flake8_1594584774608/work ftfy==5.8 future==0.18.2 gensim @ file:///home/conda/feedstock_root/build_artifacts/gensim_1589441213562/work google-api-core @ file:///home/conda/feedstock_root/build_artifacts/google-api-core-split_1597353416133/work google-auth @ file:///home/conda/feedstock_root/build_artifacts/google-auth_1598972371532/work google-cloud-core @ file:///home/conda/feedstock_root/build_artifacts/google-cloud-core_1596721852962/work google-cloud-storage @ file:///home/conda/feedstock_root/build_artifacts/google-cloud-storage_1598557481535/work google-crc32c @ file:///home/conda/feedstock_root/build_artifacts/google-crc32c_1597069930917/work google-resumable-media @ file:///home/conda/feedstock_root/build_artifacts/google-resumable-media_1598452053521/work googleapis-common-protos==1.51.0 grpcio @ file:///home/conda/feedstock_root/build_artifacts/grpcio_1596715635580/work h5py==2.10.0 idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1593328102638/work imagesize==1.2.0 importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1593211369179/work iniconfig==1.0.1 ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1595446871027/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1598749946943/work ipython-genutils==0.2.0 isort @ file:///home/conda/feedstock_root/build_artifacts/isort_1599134253980/work jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1595018882455/work Jinja2==2.11.2 jmespath @ file:///home/conda/feedstock_root/build_artifacts/jmespath_1589369830981/work joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1593624380152/work json5 @ file:///home/conda/feedstock_root/build_artifacts/json5_1591810480056/work jsonnet==0.16.0 jsonpickle==1.4.1 jsonschema==3.2.0 jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1598486169312/work jupyter-core==4.6.3 jupyterlab==2.2.6 jupyterlab-server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1593951277307/work kiwisolver==1.2.0 lazy-object-proxy==1.4.3 livereload @ file:///home/conda/feedstock_root/build_artifacts/livereload_1598114753789/work lunr==0.5.8 lxml @ file:///home/conda/feedstock_root/build_artifacts/lxml_1594322698782/work Mako @ file:///home/conda/feedstock_root/build_artifacts/mako_1595925083607/work Markdown @ file:///home/conda/feedstock_root/build_artifacts/markdown_1589366472132/work MarkupSafe==1.1.1 matplotlib @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-base_1597952254444/work mccabe==0.6.1 mistune==0.8.4 mkdocs @ file:///home/conda/feedstock_root/build_artifacts/mkdocs_1591017186129/work mkdocs-material @ file:///home/conda/feedstock_root/build_artifacts/mkdocs-material_1598971568401/work more-itertools==8.5.0 murmurhash==1.0.2 mypy @ file:///home/conda/feedstock_root/build_artifacts/mypy_1592923020520/work mypy-extensions==0.4.3 nbconvert==5.6.1 nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1594060262917/work networkx==2.5 nltk==3.4.4 notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1597285190767/work numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1597938346492/work olefile==0.46 optuna @ file:///home/conda/feedstock_root/build_artifacts/optuna_1598323699360/work overrides==3.1.0 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1589925210001/work pandas @ file:///home/conda/feedstock_root/build_artifacts/pandas_1598294454723/work pandocfilters==1.4.2 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1595548966091/work pathspec==0.8.0 patsy==0.5.1 pbr @ file:///home/conda/feedstock_root/build_artifacts/pbr_1598996708473/work pexpect==4.8.0 pickleshare==0.7.5 Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1594213010297/work plac==1.1.3 pluggy==0.13.1 preshed==3.0.2 prettytable==0.7.2 prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1590412252446/work prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1598885455507/work protobuf==3.13.0 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1594826921622/work ptyprocess==0.6.0 py==1.9.0 py-rouge==1.1 pyasn1==0.4.8 pyasn1-modules==0.2.7 pycodestyle @ file:///home/conda/feedstock_root/build_artifacts/pycodestyle_1589305246696/work pycorenlp==0.3.0 pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1593275161868/work pyflakes==2.2.0 Pygments==2.6.1 pylint @ file:///home/conda/feedstock_root/build_artifacts/pylint_1598117058668/work pymdown-extensions @ file:///home/conda/feedstock_root/build_artifacts/pymdown-extensions_1597166028984/work Pyment==0.3.3 pyneuroner==1.0.8 pyOpenSSL==19.1.0 pyparsing==2.4.7 pyperclip @ file:///home/conda/feedstock_root/build_artifacts/pyperclip_1591810382257/work pyrsistent==0.16.0 PySocks==1.7.1 pytest==6.0.1 python-dateutil==2.8.1 python-editor==1.0.4 python-magic==0.4.18 python-slugify @ file:///home/conda/feedstock_root/build_artifacts/python-slugify_1593573453419/work pytz==2020.1 PyYAML==5.3.1 pyzmq==19.0.2 regex @ file:///home/conda/feedstock_root/build_artifacts/regex_1594799371287/work requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1592425495151/work rsa @ file:///home/conda/feedstock_root/build_artifacts/rsa_1591996208734/work s3cmd==2.1.0 s3transfer==0.3.3 sacremoses==0.0.43 scikit-learn @ file:///home/conda/feedstock_root/build_artifacts/scikit-learn_1596546074663/work scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy_1595583586868/work seaborn @ file:///home/conda/feedstock_root/build_artifacts/seaborn-base_1591878760859/work Send2Trash==1.5.0 sentencepiece==0.1.91 six @ file:///home/conda/feedstock_root/build_artifacts/six_1590081179328/work slugify==0.0.1 smart-open @ file:///home/conda/feedstock_root/build_artifacts/smart_open_1598572939086/work snowballstemmer==2.0.0 soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1597680516047/work spacy==2.3.2 Sphinx @ file:///home/conda/feedstock_root/build_artifacts/sphinx_1597405755328/work sphinx-material @ file:///home/conda/feedstock_root/build_artifacts/sphinx-material_1597663680729/work sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==1.0.3 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.4 SQLAlchemy @ file:///home/conda/feedstock_root/build_artifacts/sqlalchemy_1597701920245/work srsly==1.0.2 statsmodels @ file:///home/conda/feedstock_root/build_artifacts/statsmodels_1598551025620/work stevedore @ file:///home/conda/feedstock_root/build_artifacts/stevedore_1598982656343/work tensorboardX==2.1 teradata==15.10.0.21 terminado==0.8.3 testpath==0.4.4 text-unidecode==1.3 thinc==7.4.1 threadpoolctl @ file:///tmp/tmp79xdzxkt/threadpoolctl-2.1.0-py3-none-any.whl tokenizers==0.8.1rc1 toml @ file:///home/conda/feedstock_root/build_artifacts/toml_1589469402899/work torch==1.6.0 torchvision==0.7.0 tornado==6.0.4 tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1596476591553/work traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1598976315411/work transformers==3.0.2 typed-ast==1.4.1 typing-extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1588470653596/work Unidecode==1.1.1 urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1595434816409/work wasabi==0.8.0 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1595859607677/work webencodings==0.5.1 word2number==1.1 wrapt==1.11.2 XlsxWriter @ file:///home/conda/feedstock_root/build_artifacts/xlsxwriter_1597362230404/work zipp==3.1.0 ```

Output of conda list:

``` # packages in environment at /app/local/anaconda3/envs/torch1.6_allennlp1.1: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge _openmp_mutex 4.5 1_llvm alabaster 0.7.12 py_0 alembic 1.4.2 pyh9f0ad1d_0 allennlp 1.1.0 pypi_0 pypi allennlp-models 1.1.0 pypi_0 pypi appdirs 1.4.3 py_1 argon2-cffi 20.1.0 py37h8f50634_1 astroid 2.4.2 py37hc8dfbb8_0 attrs 20.1.0 pyh9f0ad1d_0 babel 2.8.0 py_0 backcall 0.2.0 pyh9f0ad1d_0 backports 1.0 py_2 backports.functools_lru_cache 1.6.1 py_0 beautifulsoup4 4.9.1 py_1 black 19.10b0 py_4 blas 2.16 mkl bleach 3.1.5 pyh9f0ad1d_0 blis 0.4.1 pypi_0 pypi boto 2.49.0 py_0 boto3 1.14.54 pyh9f0ad1d_0 botocore 1.17.54 pyh9f0ad1d_0 brotlipy 0.7.0 py37h8f50634_1000 bz2file 0.98 py_0 c-ares 1.16.1 h516909a_3 ca-certificates 2020.6.20 hecda079_0 cachetools 4.1.1 py_0 catalogue 1.0.0 pypi_0 pypi certifi 2020.6.20 py37hc8dfbb8_0 cffi 1.14.1 py37h2b28604_0 chardet 3.0.4 py37hc8dfbb8_1006 click 7.1.2 pyh9f0ad1d_0 cliff 3.4.0 py_0 cmaes 0.6.0 pyhbc3b93e_0 cmd2 0.9.22 py37_0 colorama 0.4.3 py_0 colorlog 4.2.1 py37hc8dfbb8_0 conllu 4.1 pypi_0 pypi cryptography 3.1 py37hb09aad4_0 css-html-js-minify 2.5.5 py37hc8dfbb8_1 cudatoolkit 9.2 0 cycler 0.10.0 py_2 cymem 2.0.3 pypi_0 pypi cython 0.29.21 py37h3340039_0 decorator 4.4.2 py_0 defusedxml 0.6.0 py_0 docutils 0.15.2 py37_0 entrypoints 0.3 py37hc8dfbb8_1001 filelock 3.0.12 pypi_0 pypi flake8 3.8.3 py_1 freetype 2.10.2 he06d7ca_0 ftfy 5.8 pypi_0 pypi future 0.18.2 py37hc8dfbb8_1 gensim 3.8.3 py37h3340039_2 google-api-core 1.22.1 py37hc8dfbb8_0 google-auth 1.21.0 py_0 google-cloud-core 1.4.1 pyh9f0ad1d_0 google-cloud-storage 1.31.0 pyh9f0ad1d_0 google-crc32c 1.0.0 py37h193935f_0 google-resumable-media 1.0.0 pyh9f0ad1d_0 googleapis-common-protos 1.51.0 py37hc8dfbb8_2 grpcio 1.31.0 py37hb0870dc_0 h5py 2.10.0 pypi_0 pypi icu 67.1 he1b5a44_0 idna 2.10 pyh9f0ad1d_0 imagesize 1.2.0 py_0 importlib-metadata 1.7.0 py37hc8dfbb8_0 importlib_metadata 1.7.0 0 iniconfig 1.0.1 pypi_0 pypi ipykernel 5.3.4 py37h43977f1_0 ipython 7.18.1 py37hc6149b9_0 ipython_genutils 0.2.0 py_1 isort 5.5.0 py37hc8dfbb8_0 jedi 0.17.2 py37hc8dfbb8_0 jinja2 2.11.2 pyh9f0ad1d_0 jmespath 0.10.0 pyh9f0ad1d_0 joblib 0.16.0 py_0 jpeg 9d h516909a_0 json5 0.9.4 pyh9f0ad1d_0 jsonnet 0.16.0 pypi_0 pypi jsonpickle 1.4.1 pypi_0 pypi jsonschema 3.2.0 py37hc8dfbb8_1 jupyter_client 6.1.7 py_0 jupyter_core 4.6.3 py37hc8dfbb8_1 jupyterlab 2.2.6 py_0 jupyterlab_server 1.2.0 py_0 kiwisolver 1.2.0 py37h99015e2_0 lazy-object-proxy 1.4.3 py37h8f50634_2 lcms2 2.11 hbd6801e_0 ld_impl_linux-64 2.34 hc38a660_9 libblas 3.8.0 16_mkl libcblas 3.8.0 16_mkl libcrc32c 1.1.1 he1b5a44_2 libffi 3.2.1 he1b5a44_1007 libgcc-ng 9.3.0 h24d8f2e_16 libgfortran-ng 7.5.0 hdf63c60_16 libiconv 1.16 h516909a_0 liblapack 3.8.0 16_mkl liblapacke 3.8.0 16_mkl libpng 1.6.37 hed695b0_2 libprotobuf 3.13.0 h8b12597_0 libsodium 1.0.18 h516909a_0 libstdcxx-ng 9.3.0 hdf63c60_16 libtiff 4.1.0 hc7e4089_6 libuv 1.39.0 h516909a_0 libwebp-base 1.1.0 h516909a_3 libxml2 2.9.10 h68273f3_2 libxslt 1.1.33 h572872d_1 livereload 2.6.3 pyh9f0ad1d_0 llvm-openmp 10.0.1 hc9558a2_0 lunr 0.5.8 py37hc8dfbb8_0 lxml 4.5.2 py37he3881c9_0 lz4-c 1.9.2 he1b5a44_3 mako 1.1.3 pyh9f0ad1d_0 markdown 3.2.2 py_0 markupsafe 1.1.1 py37h8f50634_1 matplotlib-base 3.3.1 py37hd478181_1 mccabe 0.6.1 py_1 mistune 0.8.4 py37h8f50634_1001 mkdocs 1.1.2 py_0 mkdocs-material 5.5.12 py_0 mkl 2020.2 256 more-itertools 8.5.0 pypi_0 pypi murmurhash 1.0.2 pypi_0 pypi mypy 0.782 py_0 mypy_extensions 0.4.3 py37hc8dfbb8_1 nbconvert 5.6.1 py37hc8dfbb8_1 nbformat 5.0.7 py_0 ncurses 6.2 he1b5a44_1 networkx 2.5 pypi_0 pypi ninja 1.10.1 hc9558a2_1 nltk 3.4.4 py_0 nodejs 14.9.0 h568c755_0 notebook 6.1.3 py37hc8dfbb8_0 numpy 1.19.1 py37h7ea13bd_2 olefile 0.46 py_0 openssl 1.1.1g h516909a_1 optuna 2.0.0 py_1 overrides 3.1.0 pypi_0 pypi packaging 20.4 pyh9f0ad1d_0 pandas 1.1.1 py37h3340039_0 pandoc 2.10.1 h516909a_0 pandocfilters 1.4.2 py_1 parso 0.7.1 pyh9f0ad1d_0 pathspec 0.8.0 pyh9f0ad1d_0 patsy 0.5.1 py_0 pbr 5.5.0 pyh9f0ad1d_0 pexpect 4.8.0 py37hc8dfbb8_1 pickleshare 0.7.5 py37hc8dfbb8_1001 pillow 7.2.0 py37h718be6c_1 pip 20.2.2 py_0 plac 1.1.3 pypi_0 pypi pluggy 0.13.1 pypi_0 pypi preshed 3.0.2 pypi_0 pypi prettytable 0.7.2 py_3 prometheus_client 0.8.0 pyh9f0ad1d_0 prompt-toolkit 3.0.7 py_0 protobuf 3.13.0 pypi_0 pypi psutil 5.7.2 py37h8f50634_0 ptyprocess 0.6.0 py_1001 py 1.9.0 pypi_0 pypi py-rouge 1.1 pypi_0 pypi pyasn1 0.4.8 py_0 pyasn1-modules 0.2.7 py_0 pycodestyle 2.6.0 pyh9f0ad1d_0 pycorenlp 0.3.0 pypi_0 pypi pycparser 2.20 pyh9f0ad1d_2 pyflakes 2.2.0 pyh9f0ad1d_0 pygments 2.6.1 py_0 pylint 2.6.0 py37hc8dfbb8_0 pymdown-extensions 8.0 pyh9f0ad1d_0 pyment 0.3.3 pypi_0 pypi pyneuroner 1.0.8 pypi_0 pypi pyopenssl 19.1.0 py_1 pyparsing 2.4.7 pyh9f0ad1d_0 pyperclip 1.8.0 pyh9f0ad1d_0 pyrsistent 0.16.0 py37h8f50634_0 pysocks 1.7.1 py37hc8dfbb8_1 pytest 6.0.1 pypi_0 pypi python 3.7.8 h6f2ec95_1_cpython python-dateutil 2.8.1 py_0 python-editor 1.0.4 py_0 python-magic 0.4.18 pypi_0 pypi python-slugify 4.0.1 pyh9f0ad1d_0 python_abi 3.7 1_cp37m pytorch 1.6.0 py3.7_cuda9.2.148_cudnn7.6.3_0 pytz 2020.1 pyh9f0ad1d_0 pyyaml 5.3.1 py37h8f50634_0 pyzmq 19.0.2 py37hac76be4_0 readline 8.0 he28a2e2_2 regex 2020.7.14 py37h8f50634_0 requests 2.24.0 pyh9f0ad1d_0 rsa 4.6 pyh9f0ad1d_0 s3cmd 2.1.0 pypi_0 pypi s3transfer 0.3.3 py37hc8dfbb8_1 sacremoses 0.0.43 pypi_0 pypi scikit-learn 0.23.2 py37h6785257_0 scipy 1.5.2 py37hb14ef9d_0 seaborn 0.10.1 1 seaborn-base 0.10.1 py_1 send2trash 1.5.0 py_0 sentencepiece 0.1.91 pypi_0 pypi setuptools 49.6.0 py37hc8dfbb8_0 six 1.15.0 pyh9f0ad1d_0 slugify 0.0.1 py_2 smart_open 2.1.1 pyh9f0ad1d_0 snowballstemmer 2.0.0 py_0 soupsieve 2.0.1 py_1 spacy 2.3.2 pypi_0 pypi spacy-model-en_core_web_sm 2.3.1 pyh9f0ad1d_0 sphinx 3.2.1 py_0 sphinx-material 0.0.30 py_1 sphinxcontrib-applehelp 1.0.2 py_0 sphinxcontrib-devhelp 1.0.2 py_0 sphinxcontrib-htmlhelp 1.0.3 py_0 sphinxcontrib-jsmath 1.0.1 py_0 sphinxcontrib-qthelp 1.0.3 py_0 sphinxcontrib-serializinghtml 1.1.4 py_0 sqlalchemy 1.3.19 py37h8f50634_0 sqlite 3.33.0 h4cf870e_0 srsly 1.0.2 pypi_0 pypi statsmodels 0.12.0 py37h8f50634_0 stevedore 3.2.1 py37hc8dfbb8_0 tensorboardx 2.1 pypi_0 pypi teradata 15.10.0.21 py37_0 terminado 0.8.3 py37hc8dfbb8_1 testpath 0.4.4 py_0 text-unidecode 1.3 py_0 thinc 7.4.1 pypi_0 pypi threadpoolctl 2.1.0 pyh5ca1d4c_0 tini 0.18.0 h14c3975_1001 tk 8.6.10 hed695b0_0 tokenizers 0.8.1rc1 pypi_0 pypi toml 0.10.1 pyh9f0ad1d_0 torchvision 0.7.0 py37_cu92 tornado 6.0.4 py37h8f50634_1 tqdm 4.48.2 pyh9f0ad1d_0 traitlets 5.0.0 py37hc8dfbb8_0 transformers 3.0.2 pypi_0 pypi typed-ast 1.4.1 py37h516909a_0 typing_extensions 3.7.4.2 py_0 unidecode 1.1.1 py_0 urllib3 1.25.10 py_0 wasabi 0.8.0 pypi_0 pypi wcwidth 0.2.5 pyh9f0ad1d_1 webencodings 0.5.1 py_1 wheel 0.35.1 pyh9f0ad1d_0 word2number 1.1 pypi_0 pypi wrapt 1.11.2 py37h8f50634_0 xlsxwriter 1.3.3 pyh9f0ad1d_0 xz 5.2.5 h516909a_1 yaml 0.2.5 h516909a_0 zeromq 4.3.2 he1b5a44_3 zipp 3.1.0 py_0 zlib 1.2.11 h516909a_1009 zstd 1.4.5 h6597ccf_2 ```

Steps to reproduce

Example source:

``` # First, train the LM (this works): allennlp train --force --include-package src --serialization-dir tmp/elmo_lm_tran1 configs/bidirectional_language_model_part_notes.jsonnet # Second, train the NER model (this is what fails and produces the error): allennlp train --force --include-package src --serialization-dir tmp/ner_elmo_lm_tran1 configs/train_lstm_nobi_char_elmo1_v2.jsonnet ```

Contents of Language Model config file bidirectional_language_model_part_notes.jsonnet ``` { dataset_reader: { type: 'language_model_conll_reader', //"tokenizer": { // "type": "just_spaces" //}, token_indexers: { tokens: { type: 'single_id', }, token_characters: { type: 'elmo_characters', }, }, //"max_sequence_length": 400, //"start_tokens": [""], //"end_tokens": [""] }, train_data_path: 'data/processed/no_bi_tags/new_model1_train.txt', model: { type: 'language_model', bidirectional: true, //"num_samples": 8192, sparse_embeddings: true, text_field_embedder: { //"allow_unmatched_keys": true, token_embedders: { //tokens: { // type: 'embedding', // pretrained_file: 'models/word2vec_newtoken1.txt', // embedding_dim: 50, // trainable: false // }, tokens: { type: 'empty', }, token_characters: { type: 'character_encoding', embedding: { num_embeddings: 262, embedding_dim: 16, }, encoder: { type: 'cnn-highway', activation: 'relu', embedding_dim: 16, filters: [ [1, 32], [2, 32], [3, 64], [4, 128], [5, 256], [6, 512], [7, 1024], ], num_highway: 2, projection_dim: 512, projection_location: 'after_highway', do_layer_norm: true, }, }, }, }, dropout: 0.1, contextualizer: { type: 'bidirectional_language_model_transformer', input_dim: 512, hidden_dim: 2048, num_layers: 6, dropout: 0.1, input_dropout: 0.1, }, }, data_loader: { batch_size: 64, }, distributed: { cuda_devices: std.range(0, 8 - 1), }, trainer: { num_epochs: 10, optimizer: { type: 'dense_sparse_adam', }, // TODO(brendanr): Needed with transformer too? // "grad_norm": 10.0, learning_rate_scheduler: { type: 'noam', // See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401 model_size: 512, // See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51. // Adjusted based on our sample size relative to Calypso's. warmup_steps: 6000, }, //"should_log_learning_rate": true }, } ```

Contents of NER model config file train_lstm_nobi_char_elmo1_v2.jsonnet:

``` { dataset_reader: { type: 'conll_03_reader_elmo', token_indexers: { tokens: { type: 'single_id', namespace: 'tokens', }, elmo: { type: 'elmo_characters', //namespace: 'token_characters' }, }, lazy: false, }, train_data_path: 'data/processed/no_bi_tags/new_model1_train.txt', validation_data_path: 'data/processed/no_bi_tags/new_model1_val.txt', model: { //type: 'ner_lstm_crf', type: 'NER_LSTM', //without crf embedder: { token_embedders: { tokens: { type: 'embedding', pretrained_file: 'models/word2vec_newtoken1.txt', embedding_dim: 50, trainable: false, }, //elmo: { // type: 'elmo_token_embedder', // options_file: "models/elmo_2x4096_512_2048cnn_2xhighway_options.json", // weight_file: "models/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5", // do_layer_norm: false, // dropout: 0.3 // } elmo: { type: 'bidirectional_lm_token_embedder', archive_file: 'tmp/elmo_lm_tran1/model.tar.gz', dropout: 0.2, bos_eos_tokens: ['', ''], remove_bos_eos: true, requires_grad: false, }, }, }, encoder: { type: 'lstm', input_size: 1024 + 50, hidden_size: 25, dropout: 0.3, bidirectional: true, }, }, data_loader: { batch_size: 128, }, trainer: { num_epochs: 22, patience: 10, cuda_device: 0, grad_clipping: 5.0, validation_metric: '-loss', optimizer: { type: 'adam', lr: 0.01, }, }, } ```

tpanza commented 4 years ago

@dirkgr what can I do to help with this?

dirkgr commented 4 years ago

Sorry for the radio silence. This is on my radar, but the machine I wanted to use has developed problems and now we're debugging GPUs instead of tensors.

These wrong-device errors are usually easy. The problem is often that one of the tensors involved was created fresh without a device parameter, and so it defaults to CPU. I am a big fan of debuggers, so that's what I would use to find out where it came from, but there are other techniques, including "staring at the code".

github-actions[bot] commented 4 years ago

@dirkgr this is just a friendly ping to make sure you haven't forgotten about this issue 😜

dirkgr commented 4 years ago

Thank you, @github-actions. I have not.

tpanza commented 4 years ago

Hi @dirkgr please checkout #4727