allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.76k stars 2.25k forks source link

Multiprocess Data Loader with num_workers > 0 throws error about token_indexers already being applied #5132

Closed vikigenius closed 3 years ago

vikigenius commented 3 years ago

Checklist

Description

I created a custom dataset_reader and used sharded_dataset_reader for dealing with multiprocess_data_loader. However setting num_workers > 0 throws the following error

This does not happen with num_workers = 0

Python traceback:

``` Traceback (most recent call last): File "/home/void/miniconda3/envs/lexsiamese/bin/allennlp", line 8, in sys.exit(run()) File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run main(prog="allennlp") File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 119, in main args.func(args) File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/commands/train.py", line 119, in train_model_from_args file_friendly_logging=args.file_friendly_logging, File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/commands/train.py", line 178, in train_model_from_file file_friendly_logging=file_friendly_logging, File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/commands/train.py", line 292, in train_model params.duplicate(), serialization_dir, print_statistics=dry_run File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/training/util.py", line 466, in make_vocab_from_params data_loaders = data_loaders_from_params(params, serialization_dir=serialization_dir) File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/training/util.py", line 116, in data_loaders_from_params data_loader_params.duplicate(), reader=dataset_reader, data_path=train_data_path File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/common/from_params.py", line 593, in from_params **extras, File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/common/from_params.py", line 623, in from_params return constructor_to_call(**kwargs) # type: ignore File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/data/data_loaders/multiprocess_data_loader.py", line 281, in __init__ deque(self.iter_instances(), maxlen=0) File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/data/data_loaders/multiprocess_data_loader.py", line 369, in iter_instances self._gather_instances(queue), desc="loading instances" File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/tqdm/std.py", line 1178, in __iter__ for obj in iterable: File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/data/data_loaders/multiprocess_data_loader.py", line 509, in _gather_instances raise WorkerError(e, tb) allennlp.data.data_loaders.multiprocess_data_loader.WorkerError: worker raised ValueError("Found a TextField (query_tokens) with token_indexers already applied, but you're using num_workers > 0 in your data loader. Make sure your dataset reader's text_to_instance() method doesn't add any token_indexers to the TextFields it creates. Instead, the token_indexers should be added to the instances in the apply_token_indexers() method of your dataset reader (which you'll have to implement if you haven't done so already).") Traceback from worker: File "/home/void/miniconda3/envs/lexsiamese/lib/python3.7/site-packages/allennlp/data/data_loaders/multiprocess_data_loader.py", line 467, in _instance_worker f"Found a TextField ({field_name}) with token_indexers already " ValueError: Found a TextField (query_tokens) with token_indexers already applied, but you're using num_workers > 0 in your data loader. Make sure your dataset reader's text_to_instance() method doesn't add any token_indexers to the TextFields it creates. Instead, the token_indexers should be added to the instances in the apply_token_indexers() method of your dataset reader (which you'll have to implement if you haven't done so already). ```

Setting num_workers to 0 does not throw this error

Related issues or possible duplicates

Environment

OS: Linux x86_64

Python version: 3.7.10

Output of pip freeze:

``` alabaster @ file:///home/void/.cache/pypoetry/artifacts/cd/37/e0/89f7da30c12075ae566ff3d8107abdbc74fd3d19ae5644765d79dd5d47/alabaster-0.7.12-py2.py3-none-any.whl alembic @ file:///home/void/.cache/pypoetry/artifacts/cb/39/9a/86b56862682b530f53ebb1e69e78f816f9f8d08b29ffc82509f2ef76ac/alembic-1.5.8-py2.py3-none-any.whl allennlp @ file:///home/void/.cache/pypoetry/artifacts/2b/5d/f1/e0ad9a1e311da16af778085f277fa0c6a40d19d052d93ffd410785deb2/allennlp-2.3.0-py3-none-any.whl allennlp-optuna @ file:///home/void/.cache/pypoetry/artifacts/1b/6c/5b/1cea8d351d1eb5e32c282cfc0e3acb07f4e71ab6bc3d8ac96025cadf64/allennlp_optuna-0.1.5-py3-none-any.whl argon2-cffi @ file:///home/void/.cache/pypoetry/artifacts/40/58/da/99ea4f10c652469eb6c623b269fb96784ed9bbdab439c3a5bd9d9afa6e/argon2_cffi-20.1.0-cp35-abi3-manylinux1_x86_64.whl astor @ file:///home/void/.cache/pypoetry/artifacts/5f/e9/ea/ff2986fae56c8b24978d6ca48a057f02e9e4845b5b9b499a2506121369/astor-0.8.1-py2.py3-none-any.whl async-generator @ file:///home/void/.cache/pypoetry/artifacts/5a/c9/85/708dc64d76e0faea9f132181d1f9589bfab62218ac9bbef7d6cfc821d2/async_generator-1.10-py3-none-any.whl attrs @ file:///home/void/.cache/pypoetry/artifacts/ae/b3/61/38a043abdba5ba4c0c510dde549dd1c8278bf56262dc5df55c19133a02/attrs-20.3.0-py2.py3-none-any.whl Babel @ file:///home/void/.cache/pypoetry/artifacts/0d/6c/d0/f735d4d0af68640ee69adddc80d3ecf336156e84c4ccf2078c8fe9e38b/Babel-2.9.0-py2.py3-none-any.whl backcall @ file:///home/void/.cache/pypoetry/artifacts/d9/1c/14/88957e7a43c92c6678d8ca482196186836144475c67d11ff02a4ee2194/backcall-0.2.0-py2.py3-none-any.whl bandit @ file:///home/void/.cache/pypoetry/artifacts/b1/65/70/cb20da954def2f182c80e2555654f456552633122e9f8f23425855b46b/bandit-1.7.0-py3-none-any.whl beautifulsoup4 @ file:///home/void/.cache/pypoetry/artifacts/d7/4b/c2/1a65d7699a83c83aaec6fa35b97cb59760569be3e24f903ae4b55ac3a5/beautifulsoup4-4.9.3-py3-none-any.whl bleach @ file:///home/void/.cache/pypoetry/artifacts/0c/19/da/291df1a8b71e9bd208e3e6f801afd6567b9ef47036b167c11e25f5c96c/bleach-3.3.0-py2.py3-none-any.whl blis @ file:///home/void/.cache/pypoetry/artifacts/22/59/92/fbc277252d447c1eea4adf771a209657eb1d536c3f9b2b5bbdf10b5b45/blis-0.7.4-cp37-cp37m-manylinux2014_x86_64.whl boto3 @ file:///home/void/.cache/pypoetry/artifacts/f0/1c/4b/54fd3599e1872242dddd68dea523ba5440560ba56bcace6a5271a5b7fb/boto3-1.17.53-py2.py3-none-any.whl botocore @ file:///home/void/.cache/pypoetry/artifacts/39/30/63/e4763d6e3afc7c0cececf6d2af2171c305849dd83c6725134b2db0ad14/botocore-1.20.53-py2.py3-none-any.whl cached-property @ file:///home/void/.cache/pypoetry/artifacts/38/d8/aa/b8baaf6448a0029023e15cbfc6e1a278d60cc2e2b022c94bc850561996/cached_property-1.5.2-py2.py3-none-any.whl catalogue @ file:///home/void/.cache/pypoetry/artifacts/6b/d2/12/e162a59d9b422d9802b9dced62cc39ab69afbf475f9c64779bf2400ec5/catalogue-2.0.3-py3-none-any.whl certifi==2020.12.5 cffi @ file:///home/void/.cache/pypoetry/artifacts/25/db/72/24c31ee860d752b550f7744febafb4d2b3bfe3ada972f163ccdf8ae711/cffi-1.14.5-cp37-cp37m-manylinux1_x86_64.whl chardet @ file:///home/void/.cache/pypoetry/artifacts/f0/e1/66/f8ced421461f1dda06ea89af6ac51a22cf72ff0595f329808634559b2b/chardet-4.0.0-py2.py3-none-any.whl click @ file:///home/void/.cache/pypoetry/artifacts/21/fd/0f/f7ff619e0ab099fc284ee2b24a86129d9dc3ad2a475dc304bbbbe20ecb/click-7.1.2-py2.py3-none-any.whl cliff @ file:///home/void/.cache/pypoetry/artifacts/4c/37/89/981a4be88fc94f7ff2523f5646f2994032111d85da21ad553e32f343fe/cliff-3.7.0-py3-none-any.whl cmaes @ file:///home/void/.cache/pypoetry/artifacts/08/1c/a2/065ada5c6a31f5b9a476c139ce2ca736614b5d9ffb65172565b46d5ec7/cmaes-0.8.2-py3-none-any.whl cmd2 @ file:///home/void/.cache/pypoetry/artifacts/c9/15/0e/8e818c612a709f0e313c624f3dfed5ee30c95d42e57f49fe80cb2cc4f6/cmd2-1.5.0-py3-none-any.whl colorama @ file:///home/void/.cache/pypoetry/artifacts/33/a2/a4/09f68d0a2176d987da70e9dee0eaea3cc48f68b56a7fa8ac56c2d22dc7/colorama-0.4.4-py2.py3-none-any.whl colorlog @ file:///home/void/.cache/pypoetry/artifacts/1c/08/cf/e706def39db46dc00865c88496d0fa0b044c30a01831eb84d639d9f2f2/colorlog-5.0.1-py2.py3-none-any.whl configparser @ file:///home/void/.cache/pypoetry/artifacts/62/d3/97/a2949c74cc1115909f8c1f42f474c3ab22e1da2667d2ddff71b3e6efff/configparser-5.0.2-py3-none-any.whl coverage @ file:///home/void/.cache/pypoetry/artifacts/9b/3b/0e/7304b9b5727e6d8c7dcb7790e5a7ada9f9b0eb3d447c55bafbd1e10a6e/coverage-5.5-cp37-cp37m-manylinux2010_x86_64.whl cymem @ file:///home/void/.cache/pypoetry/artifacts/89/dc/1c/cd70280e9193082ebee5dba1b3b3302ff4c64d1524e041c0c03516261c/cymem-2.0.5-cp37-cp37m-manylinux2014_x86_64.whl darglint @ file:///home/void/.cache/pypoetry/artifacts/d0/22/a6/0ae1baa693bb2faf4f905bd3da5e7c7acf12bca962fc4d927838ee8b41/darglint-1.8.0-py3-none-any.whl decorator @ file:///home/void/.cache/pypoetry/artifacts/4c/44/a3/98a9811f5ccd978d1c278eb97127c919a58ca805b3e54fba6c0b212265/decorator-5.0.7-py3-none-any.whl defusedxml @ file:///home/void/.cache/pypoetry/artifacts/d3/69/a8/eb355ff24ffb8df62ec3dd9524bec0ad9d9dc719bd996734d6d7aa1d56/defusedxml-0.7.1-py2.py3-none-any.whl dictdiffer @ file:///home/void/.cache/pypoetry/artifacts/e4/93/97/60397fd0d7cca2bb6e18b78a6a7686f75a859f1b8f237167d7d4737b1d/dictdiffer-0.8.1-py2.py3-none-any.whl doc8 @ file:///home/void/.cache/pypoetry/artifacts/79/3e/c8/ae33df607c50685be0d497522a1af2a835f1be1c77709951aaa6b195db/doc8-0.8.1-py2.py3-none-any.whl docker-pycreds @ file:///home/void/.cache/pypoetry/artifacts/ae/d2/e4/d45ddf9b807389820c106b6d5cc636f5a794fb93631d9b8119fb110ec3/docker_pycreds-0.4.0-py2.py3-none-any.whl docutils @ file:///home/void/.cache/pypoetry/artifacts/24/17/76/ad5143b189440a07a8cd43100d25b414c020d591981c6141f1881f7fe6/docutils-0.17-py2.py3-none-any.whl dparse @ file:///home/void/.cache/pypoetry/artifacts/61/38/88/d729d74e312bdef39a41d9388f962e838a49f53d9964f881552bd6b6db/dparse-0.5.1-py3-none-any.whl entrypoints @ file:///home/void/.cache/pypoetry/artifacts/5e/99/ed/7ceb3b7ba71bc66f2526e7ffc16315bfdb5bf955fe1051ec05516f7730/entrypoints-0.3-py2.py3-none-any.whl eradicate @ file:///home/void/.cache/pypoetry/artifacts/4c/f8/6e/535a5eaa918010239f4badceb49f6e6ff22c1c5bad8db9ff54cee17163/eradicate-1.0.tar.gz filelock @ file:///home/void/.cache/pypoetry/artifacts/08/55/8e/3a41c1abc99a96a15b063c1f6c0bb06c4ae6cbb78de462a1999579e087/filelock-3.0.12-py3-none-any.whl flake8 @ file:///home/void/.cache/pypoetry/artifacts/b8/54/22/c86908a17e023a917c963c0afe98570cf4b6f07c4407b85c6e3beb7128/flake8-3.9.1-py2.py3-none-any.whl flake8-bandit @ file:///home/void/.cache/pypoetry/artifacts/73/09/84/42a6b41975a42f2c631d7c0ba5cb35c38c0d0560af60ec888e95cbc82e/flake8_bandit-2.1.2.tar.gz flake8-broken-line @ file:///home/void/.cache/pypoetry/artifacts/fb/e0/e6/e7d66797cfa78abf59a681622e4a64ad26a3f37f755dff1aa2310e89ed/flake8_broken_line-0.2.1-py3-none-any.whl flake8-bugbear @ file:///home/void/.cache/pypoetry/artifacts/62/76/39/527cdbe01977956d193c1246dd1093d1c04368bf5020682fdd6b74408e/flake8_bugbear-19.8.0-py35.py36.py37-none-any.whl flake8-commas @ file:///home/void/.cache/pypoetry/artifacts/f5/ba/10/b4bde8612d74e39a007b65228f1167e91320847a48fe99d2514ccaa78d/flake8_commas-2.0.0-py2.py3-none-any.whl flake8-comprehensions @ file:///home/void/.cache/pypoetry/artifacts/e0/11/b2/d171d6145b51bc4e54263b0ffeba234d0b693e055e47eee4d2551e7e11/flake8_comprehensions-3.4.0-py3-none-any.whl flake8-debugger @ file:///home/void/.cache/pypoetry/artifacts/ea/b0/a2/b0f38254a64bb29ae6356646e0a6cd6607d383a7a3809358603dc0bb4c/flake8-debugger-3.2.1.tar.gz flake8-docstrings @ file:///home/void/.cache/pypoetry/artifacts/dd/61/0f/8c31b8a10df8152ee5f1400f965df4a1d9f14d1c2e73bdeda389921f06/flake8_docstrings-1.6.0-py2.py3-none-any.whl flake8-eradicate @ file:///home/void/.cache/pypoetry/artifacts/a0/87/7e/94d2c66c1eab6acc65ded58a867ad8c684c7bdd6eeceb91a0b336aa63b/flake8_eradicate-0.3.0-py3-none-any.whl flake8-isort @ file:///home/void/.cache/pypoetry/artifacts/aa/be/9e/5b4f728aa7c298adc33f71a13676c41d843bb89af6c0a71493b27aff01/flake8_isort-3.0.1-py2.py3-none-any.whl flake8-plugin-utils @ file:///home/void/.cache/pypoetry/artifacts/35/9a/10/f8f41d43896f3eac1b19353fd56b1de85622630254660e02d2504f5d6d/flake8_plugin_utils-1.3.1-py3-none-any.whl flake8-polyfill @ file:///home/void/.cache/pypoetry/artifacts/50/2e/1c/0ab55451b665fa42f02e5e0b60b4f1e3d0c5b97e523e6942fcb469b060/flake8_polyfill-1.0.2-py2.py3-none-any.whl flake8-pytest-style @ file:///home/void/.cache/pypoetry/artifacts/fa/25/c3/fc9f818d7a076d63faa217ea2adf691c92b0000cee5b6420bbe03ab19d/flake8_pytest_style-1.4.1-py3-none-any.whl flake8-quotes @ file:///home/void/.cache/pypoetry/artifacts/23/a7/ee/bc35529d5fb4ce0aef80dab51eca97b55a70c2183efcc68498a668f41a/flake8-quotes-2.1.2.tar.gz flake8-rst-docstrings @ file:///home/void/.cache/pypoetry/artifacts/13/52/41/eb73820e56d7ffb96a09509569d9d0e6b1068dc5d082da30c3ec40a390/flake8-rst-docstrings-0.0.12.tar.gz flake8-string-format @ file:///home/void/.cache/pypoetry/artifacts/fb/c5/8f/34f45df55140c42298862824e72ce2a67620ddad02e0173ac9654f927c/flake8_string_format-0.2.3-py2.py3-none-any.whl gitdb @ file:///home/void/.cache/pypoetry/artifacts/96/6b/0d/8c98bd5a440942e37e198088da688917df817926cdd1828c67629d73d1/gitdb-4.0.7-py3-none-any.whl GitPython @ file:///home/void/.cache/pypoetry/artifacts/d1/23/d7/b24886986eaf6c660285de59cee29e42c3a030b12d15b544fcacfc889b/GitPython-3.1.14-py3-none-any.whl greenlet @ file:///home/void/.cache/pypoetry/artifacts/ec/cd/50/631d3cee3e8163d49884ac822731ba5854db56ef007f7b3d1687c12e99/greenlet-1.0.0-cp37-cp37m-manylinux2010_x86_64.whl h5py @ file:///home/void/.cache/pypoetry/artifacts/5f/5a/4c/0ae2db5f88cf83abf33db3ffa364e0a90e13d7a75cde8560e9c3981af7/h5py-3.2.1-cp37-cp37m-manylinux1_x86_64.whl identify @ file:///home/void/.cache/pypoetry/artifacts/3a/22/ae/4b0b0071dcc3a4ac84f7e5968b9064201a9e64d5be0ffc1864095cfd4d/identify-2.2.3-py2.py3-none-any.whl idna @ file:///home/void/.cache/pypoetry/artifacts/71/d9/bc/a8481f6ac8b5d0ecc0fbd34aca906ee68d1757f24fefbd0f4294c0c9d2/idna-2.10-py2.py3-none-any.whl imagesize @ file:///home/void/.cache/pypoetry/artifacts/b0/0c/4c/2da9b5d688f3d57232a399cd66d8f682f2246dff0008a0253eed36d086/imagesize-1.2.0-py2.py3-none-any.whl importlib-metadata @ file:///home/void/.cache/pypoetry/artifacts/31/fc/05/71417d693371ef10b30a9289eb98e554ae850569e049e4c7ba0dbe2e44/importlib_metadata-3.10.1-py3-none-any.whl ipykernel @ file:///home/void/.cache/pypoetry/artifacts/50/d7/4f/776813e1bb58bc2288bef5ae69c62cc0710c62396910d1e6799f537c7c/ipykernel-5.5.3-py3-none-any.whl ipython @ file:///home/void/.cache/pypoetry/artifacts/1b/f6/06/a4687a1a1ea57b27c1a2879652854425a44bd41192c02f799f1aaef247/ipython-7.22.0-py3-none-any.whl ipython-genutils @ file:///home/void/.cache/pypoetry/artifacts/2c/69/c6/e1f2fd156ee87f59d0c32acc921a0da31121b0c7c192b1b9ac0908111d/ipython_genutils-0.2.0-py2.py3-none-any.whl ipywidgets @ file:///home/void/.cache/pypoetry/artifacts/83/db/53/e89cbc09943d34d82339a205aa58c225bb26c9cb2cc104716f9c5bfeba/ipywidgets-7.6.3-py2.py3-none-any.whl isort @ file:///home/void/.cache/pypoetry/artifacts/4d/3d/1d/2bf08f2c5646377d4283866820ce20c18eb1615dec01e7288eefbf8695/isort-4.3.21-py2.py3-none-any.whl jedi @ file:///home/void/.cache/pypoetry/artifacts/1b/c9/89/e6d1f3a2cb2069fa5cacdaf2b474f924b727f0c38edd54d886e07a75bc/jedi-0.18.0-py2.py3-none-any.whl Jinja2 @ file:///home/void/.cache/pypoetry/artifacts/47/18/a4/1905063a877fa68496ecb2e347fc04c431948d96cf825a0960da5791e0/Jinja2-2.11.3-py2.py3-none-any.whl jmespath @ file:///home/void/.cache/pypoetry/artifacts/a9/2b/32/eb9ed41e3f5118971d1741c1299d1e7a70ca4345e5898d5a5d663bfd5f/jmespath-0.10.0-py2.py3-none-any.whl joblib @ file:///home/void/.cache/pypoetry/artifacts/31/29/01/db4dcbbea55316357053572689a218c92920416d025fcdf575ed68d0c9/joblib-1.0.1-py3-none-any.whl jsonnet @ file:///home/void/.cache/pypoetry/artifacts/fd/e3/b2/346cba762f726f74df60a0c229bb69087d9f3a06df6d4cf7ae8d0ba9d2/jsonnet-0.17.0.tar.gz jsonschema @ file:///home/void/.cache/pypoetry/artifacts/a2/e2/79/5896dfe12b442a5a7583226fb0d61aec227575025e5ac51deecf719547/jsonschema-3.2.0-py2.py3-none-any.whl jupyter @ file:///home/void/.cache/pypoetry/artifacts/12/56/8a/0c3f4ff4bf0613de3a1020ba2cb4f35919d4d55f3f364cc7b217f63a4c/jupyter-1.0.0-py2.py3-none-any.whl jupyter-client @ file:///home/void/.cache/pypoetry/artifacts/75/86/a0/f28d8ede7e50d46c1394e240dcd1c2706cfdef4313f6fb502474491a75/jupyter_client-6.2.0-py3-none-any.whl jupyter-console @ file:///home/void/.cache/pypoetry/artifacts/99/c8/ec/2de2d0ccfb90fe23e9752664cae6e94c963c37391f2ff44cfbbf5bfa31/jupyter_console-6.4.0-py3-none-any.whl jupyter-core @ file:///home/void/.cache/pypoetry/artifacts/6c/b4/0a/8912b11fb38e65e26cdf07a09561b340b0a6f4f5cc4b03dc17f8f678db/jupyter_core-4.7.1-py3-none-any.whl jupyterlab-pygments @ file:///home/void/.cache/pypoetry/artifacts/92/13/97/3ba1dbd6e97ac9bf843bceb4d66902ecd4dab945d10002cc16d0daa1d8/jupyterlab_pygments-0.1.2-py2.py3-none-any.whl jupyterlab-widgets @ file:///home/void/.cache/pypoetry/artifacts/d9/bd/77/52835917d713d609702eab7cb97fdae0bb6c8c66ff068eb0dc62fd6bc8/jupyterlab_widgets-1.0.0-py3-none-any.whl lmdb @ file:///home/void/.cache/pypoetry/artifacts/66/40/81/8353114c19e9fedaca18f9af102abcf29fb8c18fec3b305ffba050d4bb/lmdb-1.2.0-cp37-cp37m-manylinux2010_x86_64.whl m2r @ file:///home/void/.cache/pypoetry/artifacts/76/16/4d/be80e6cd238bb41bb57dd8f2b2ec41716bd5c298140206f363d434c080/m2r-0.2.1.tar.gz Mako @ file:///home/void/.cache/pypoetry/artifacts/95/cd/6c/b720114a151a63afb11ee90775d2d6543c40b925c7e1d30290d7496594/Mako-1.1.4-py2.py3-none-any.whl MarkupSafe @ file:///home/void/.cache/pypoetry/artifacts/cd/da/a5/4cfa20f311002e7588045a07491e292e7fb819fcc777144055b7c7ba89/MarkupSafe-1.1.1-cp37-cp37m-manylinux2010_x86_64.whl marshmallow @ file:///home/void/.cache/pypoetry/artifacts/bf/03/40/3eadfe49de5c4d68198fb3901ac2691f77ad8fa480f0786256aee0714c/marshmallow-3.11.1-py2.py3-none-any.whl marshmallow-polyfield @ file:///home/void/.cache/pypoetry/artifacts/85/27/e8/aec375b960a3f69ffde88981dd1211e65b7b7a692a8b44a3778acebf38/marshmallow_polyfield-5.10-py3-none-any.whl mccabe @ file:///home/void/.cache/pypoetry/artifacts/c2/cd/ed/3c4495a1422fb12eefbca8b3c6ccc83ab4ec92fd39df029199cc4f4ee4/mccabe-0.6.1-py2.py3-none-any.whl mistune @ file:///home/void/.cache/pypoetry/artifacts/80/fd/6c/c86cb01dda756e2e899197f574484928622e9ad453d90761abac9e1948/mistune-0.8.4-py2.py3-none-any.whl more-itertools @ file:///home/void/.cache/pypoetry/artifacts/36/5e/6e/086e365056443ea7340684bae7e448349db3e9ed8be4d1f089f351b2e2/more_itertools-8.7.0-py3-none-any.whl murmurhash @ file:///home/void/.cache/pypoetry/artifacts/6f/20/56/fb7c026c670b5c29fee53124be83411e1a7de6d85ed8daaca7589e9871/murmurhash-1.0.5-cp37-cp37m-manylinux2014_x86_64.whl mypy @ file:///home/void/.cache/pypoetry/artifacts/ba/8d/14/63dc6e4251a288ca5cf70b45458c5e898780ab477364ebe918a3119997/mypy-0.790-cp37-cp37m-manylinux1_x86_64.whl mypy-extensions @ file:///home/void/.cache/pypoetry/artifacts/41/fb/ef/133cd18a3e22a06b8d77dfe2ba71c50c509e4d2484ee619c6631c0b5b2/mypy_extensions-0.4.3-py2.py3-none-any.whl nbclient @ file:///home/void/.cache/pypoetry/artifacts/92/7a/49/4c2666fa49a3f47e490b88dca3aaa53a92db0c1c3bf101db29b6c43164/nbclient-0.5.3-py3-none-any.whl nbconvert @ file:///home/void/.cache/pypoetry/artifacts/9f/b0/34/add1712e9bfa620ab77d2d5d5603591edc85cd6c4ba1a40b4385befda2/nbconvert-6.0.7-py3-none-any.whl nbformat @ file:///home/void/.cache/pypoetry/artifacts/0b/d3/e5/88a4198def0d0bfda1c57a758546b5d2f7bd8edd3dc9be0b71555110ba/nbformat-5.1.3-py3-none-any.whl nest-asyncio @ file:///home/void/.cache/pypoetry/artifacts/a6/03/5a/2c77454326bb7a0a235ae4a78437c007a6ef2631cf40b34da26b5729c4/nest_asyncio-1.5.1-py3-none-any.whl nitpick @ file:///home/void/.cache/pypoetry/artifacts/a5/20/b8/993266cc8ff195f92efb6ab96bd376eece4f04a7eaa5f7e7aeb9f52dfd/nitpick-0.23.1-py3-none-any.whl nltk @ file:///home/void/.cache/pypoetry/artifacts/59/92/71/a8c5b581863e7e25355f9e4468c27343f31b423976941e6325acd0554f/nltk-3.6.1-py3-none-any.whl notebook @ file:///home/void/.cache/pypoetry/artifacts/8c/7f/b5/1734baadda7ddfda9a357cbc94c4eb7756a74fe40add4a4ab3a87bc828/notebook-6.3.0-py3-none-any.whl numpy @ file:///home/void/.cache/pypoetry/artifacts/9b/29/b8/7e795a24270a4c1f149286b044e1aafd59ce92bdaad676b7d44ae8187f/numpy-1.20.2-cp37-cp37m-manylinux2010_x86_64.whl optuna @ file:///home/void/.cache/pypoetry/artifacts/fa/68/90/ceaf47da1b66a35a3fdc4b6e24b06bf9f88e233b5f739e430bd13d9d4c/optuna-2.7.0-py3-none-any.whl overrides @ file:///home/void/.cache/pypoetry/artifacts/84/72/62/00a8159d8d9cee75aefc43667435b680f1683000bf235588b78014f01f/overrides-3.1.0.tar.gz packaging @ file:///home/void/.cache/pypoetry/artifacts/bf/c9/4b/d4c56a8494978126d690da73a04dfe71b97fa991e52f1634afc46f263e/packaging-20.9-py2.py3-none-any.whl pandas @ file:///home/void/.cache/pypoetry/artifacts/20/e6/0f/feac64ed8cd0e30be8e53f5127de980b8319f906615c21c8b7b5400296/pandas-1.1.5-cp37-cp37m-manylinux1_x86_64.whl pandocfilters @ file:///home/void/.cache/pypoetry/artifacts/95/80/e1/8047532d4a0988efe4c86231e4aede0125e9bb1dcd96dc7208e29099e6/pandocfilters-1.4.3.tar.gz parso @ file:///home/void/.cache/pypoetry/artifacts/dc/ff/07/1556e66e77c039a21cd51bc0de4c5777b35569c3903674997b2cdfb9f5/parso-0.8.2-py2.py3-none-any.whl pathtools @ file:///home/void/.cache/pypoetry/artifacts/32/fd/91/3eef9683d97849cbd83965bdeee1d1c174066836230a13c775c457fa99/pathtools-0.1.2.tar.gz pathy @ file:///home/void/.cache/pypoetry/artifacts/96/c6/de/613b8d4b2f063d538d827205a59b4b0d5370682ea65102e29a5f658c6d/pathy-0.4.0-py3-none-any.whl pbr @ file:///home/void/.cache/pypoetry/artifacts/30/01/4f/39bfa10a7db631fdb8d04545f995c48353f33404db8ad29f7b3ab7847b/pbr-5.5.1-py2.py3-none-any.whl pep8-naming @ file:///home/void/.cache/pypoetry/artifacts/0d/f2/0c/77ac3ab0d1cfa522f402e523b7fb4e93fe039c5f6ad9ecec1d783871ed/pep8_naming-0.9.1-py2.py3-none-any.whl pexpect @ file:///home/void/.cache/pypoetry/artifacts/ac/ff/fd/e4fa201b733fa24e77b6e0f8e1a2e0e9d4bd7cb1936861c9b12e4653a0/pexpect-4.8.0-py2.py3-none-any.whl pickleshare @ file:///home/void/.cache/pypoetry/artifacts/e3/49/c6/dda859db430eaa2b27acc6a8bab879e41d2bf09e99f792343ccc9d1fef/pickleshare-0.7.5-py2.py3-none-any.whl Pillow @ file:///home/void/.cache/pypoetry/artifacts/33/2c/8b/596c551987d35a45fb1ec5bca0f603038a9d768054d449544befaede0e/Pillow-8.2.0-cp37-cp37m-manylinux1_x86_64.whl pluggy @ file:///home/void/.cache/pypoetry/artifacts/bc/2e/c9/c04063460a7a68d2e59c9ea0a673de9d7930d54f788ed0510cdcf8aa78/pluggy-0.13.1-py2.py3-none-any.whl preshed @ file:///home/void/.cache/pypoetry/artifacts/2a/eb/44/6c826ae0ffba371a4f27e852dfee9ef9ffbfb3f6652cbbd380b42f6f4e/preshed-3.0.5-cp37-cp37m-manylinux2014_x86_64.whl prettytable @ file:///home/void/.cache/pypoetry/artifacts/a9/0c/6c/03413065886102725c74f371e697bf00d608e02fcc1fadfc86e38b239a/prettytable-2.1.0-py3-none-any.whl prometheus-client @ file:///home/void/.cache/pypoetry/artifacts/0b/d6/f7/38818ac7b9cdc0284e61fcdf0611a28821f2590cc115109e0dbaca44a6/prometheus_client-0.10.1-py2.py3-none-any.whl promise @ file:///home/void/.cache/pypoetry/artifacts/2b/c7/61/34271997f7584c0fed7a921acc0b1fdf77a383cbed4844419c4e8a3d83/promise-2.3.tar.gz prompt-toolkit @ file:///home/void/.cache/pypoetry/artifacts/42/43/a5/0a3723dadc2a4c0c2daa3c0f1616bec63165729f3b7611a888dea550a4/prompt_toolkit-3.0.18-py3-none-any.whl protobuf @ file:///home/void/.cache/pypoetry/artifacts/d7/0d/8c/079c0f0d7b3e2630df6660e993b5ee1a584fdd0dd5fe77b36d0b0444e5/protobuf-3.15.8-cp37-cp37m-manylinux1_x86_64.whl psutil @ file:///home/void/.cache/pypoetry/artifacts/e7/ee/ed/9817a6e3fa8217c13cf17b1bb44507668f1cfd9f057aaf816c8762f172/psutil-5.8.0-cp37-cp37m-manylinux2010_x86_64.whl ptyprocess @ file:///home/void/.cache/pypoetry/artifacts/af/cd/8c/c1510ca357886f8af9948e5555f25db9e360b1dd798566e6e9540c3442/ptyprocess-0.7.0-py2.py3-none-any.whl py @ file:///home/void/.cache/pypoetry/artifacts/56/1d/e3/7dad75e1bf797fbd8937b37dd43d1656357c67a199e9f54d48535697d0/py-1.10.0-py2.py3-none-any.whl pycodestyle @ file:///home/void/.cache/pypoetry/artifacts/a5/4d/b8/38f79509a4c7ac2a12983ce715595e1f983965f0e4c03d632074c05eec/pycodestyle-2.7.0-py2.py3-none-any.whl pycparser @ file:///home/void/.cache/pypoetry/artifacts/44/e9/07/88a70ff44631b83a33a8011053104dffbca00761b983eff85051639df2/pycparser-2.20-py2.py3-none-any.whl pydantic @ file:///home/void/.cache/pypoetry/artifacts/a2/97/d0/ea9e192cb9618b7c2e414860077a8353f5153e377eab0aba2e3844f50e/pydantic-1.7.3-cp37-cp37m-manylinux2014_x86_64.whl pydocstyle @ file:///home/void/.cache/pypoetry/artifacts/72/0e/e5/0e72e0766b925b9443bad8038924832aaa21007eac3e41da5c3cd22bf4/pydocstyle-6.0.0-py3-none-any.whl pyflakes @ file:///home/void/.cache/pypoetry/artifacts/23/7e/52/ea1293b6028d8abc80bab40d1d20c22ae4fb0290b35f06541da7cab403/pyflakes-2.3.1-py2.py3-none-any.whl Pygments @ file:///home/void/.cache/pypoetry/artifacts/14/df/54/07ac62d5eed39cfb52f6439b1afc41e12a205a48755bd7586dda35e565/Pygments-2.8.1-py3-none-any.whl pyparsing @ file:///home/void/.cache/pypoetry/artifacts/78/8b/03/23dc60df50f099a658dd13c86d7d94564b0b86bfa2ff61bc9595fb2fcb/pyparsing-2.4.7-py2.py3-none-any.whl pyperclip @ file:///home/void/.cache/pypoetry/artifacts/32/88/cb/8cae34dd62a8a0152e9b330698cfdb022deb2f066b7944acf5511dfd6f/pyperclip-1.8.2.tar.gz pyrsistent @ file:///home/void/.cache/pypoetry/artifacts/82/fd/98/fab6ad55bd376f1da134a3376cf61717a18104b408383b860716048249/pyrsistent-0.17.3.tar.gz pytest @ file:///home/void/.cache/pypoetry/artifacts/a8/5a/78/7536a0da5c14b85637968b588ecf3bde096ce084658eab4180c94412fa/pytest-5.4.3-py3-none-any.whl pytest-cov @ file:///home/void/.cache/pypoetry/artifacts/68/74/14/7ce422aeb24fffce22002e5981b110ae5719e0673fd0f8b053011944bf/pytest_cov-2.11.1-py2.py3-none-any.whl pytest-randomly @ file:///home/void/.cache/pypoetry/artifacts/83/d8/d1/7651b8757550ced3b9754eee50cce8cc9d184de1ef7e756db8d960f0e1/pytest_randomly-3.7.0-py3-none-any.whl python-dateutil @ file:///home/void/.cache/pypoetry/artifacts/75/fa/68/ee8cf8ee229ebfb7947af0398184c39bbf243b7dc67ee46cca45938d09/python_dateutil-2.8.1-py2.py3-none-any.whl python-editor @ file:///home/void/.cache/pypoetry/artifacts/51/f9/12/c230460443322196110063793391d5d4ca9aabe0c697c4f402c32c5453/python_editor-1.0.4-py3-none-any.whl python-slugify @ file:///home/void/.cache/pypoetry/artifacts/55/f6/8f/1a0d0963c09d17ff4da7c7cec7183ab25e957e42a8e27d635c613d64c8/python-slugify-4.0.1.tar.gz pytz @ file:///home/void/.cache/pypoetry/artifacts/0c/d0/94/bbd2fee71be292862261e85cefb2231b1df628c7a0e5cb0170d8304963/pytz-2021.1-py2.py3-none-any.whl PyYAML @ file:///home/void/.cache/pypoetry/artifacts/17/3c/f6/dd4498c1b6b7cdef0517d1e5c0a56e52886d36350589195788eed24d29/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl pyzmq @ file:///home/void/.cache/pypoetry/artifacts/dd/a2/24/c8ec691e3ac51d6df3c9b91bc06a2d49ba9f99af4146527b6efb3c3585/pyzmq-22.0.3-cp37-cp37m-manylinux1_x86_64.whl qtconsole @ file:///home/void/.cache/pypoetry/artifacts/b6/7c/3e/f5d07f4ef3164595943d3e3dbd7a718b9cdb77a8e888af66af246b0d11/qtconsole-5.0.3-py3-none-any.whl QtPy @ file:///home/void/.cache/pypoetry/artifacts/8d/c8/7b/f07109a6dc6b92fe95473bd29a9abd2432c5117aba360c912a681384f4/QtPy-1.9.0-py2.py3-none-any.whl regex @ file:///home/void/.cache/pypoetry/artifacts/5b/86/c7/caf2fd4e0eace88a30e3b2109f1587ce13a19f81611406f179c9dd3754/regex-2021.4.4-cp37-cp37m-manylinux2014_x86_64.whl requests @ file:///home/void/.cache/pypoetry/artifacts/f2/13/b8/cc7ac8d0aa2630507c04c2c0e72307bed4ee7e2c92d7c3d97a5d61e74e/requests-2.25.1-py2.py3-none-any.whl restructuredtext-lint @ file:///home/void/.cache/pypoetry/artifacts/d8/9d/f9/d7f05191e8128f58b61dbdb962c85f40a4c14da47d397adcfe27d98193/restructuredtext_lint-1.3.2.tar.gz ruamel.yaml @ file:///home/void/.cache/pypoetry/artifacts/82/7e/93/51925fb555452a6bea3fbc3f42bc1d342af303339634400c4f154a3fa5/ruamel.yaml-0.17.4-py3-none-any.whl ruamel.yaml.clib @ file:///home/void/.cache/pypoetry/artifacts/a9/eb/ab/743349c1b48fce4dedaaab59b2ab0ced108f6c46161b470a5dd01a9f50/ruamel.yaml.clib-0.2.2-cp37-cp37m-manylinux1_x86_64.whl s3transfer @ file:///home/void/.cache/pypoetry/artifacts/0f/3e/bc/9588da108fc381717df2b2c71aa60bb824766e14f18fd46e4abf99dd67/s3transfer-0.3.7-py2.py3-none-any.whl sacremoses @ file:///home/void/.cache/pypoetry/artifacts/3b/67/ce/fc1e875ccddd89c0fc964d30eb160ce43474fa74836f2d9166c44fa5d0/sacremoses-0.0.44.tar.gz safety @ file:///home/void/.cache/pypoetry/artifacts/d6/12/3a/0fada211c21fd9e66dfb15b64d9f8d21351c22702f61cd7a73594ee1de/safety-1.10.3-py2.py3-none-any.whl scikit-learn @ file:///home/void/.cache/pypoetry/artifacts/88/94/54/5c47ee3b72e9562608bad938570faf87e8381dcf99415a79dcfc865ad0/scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl scipy @ file:///home/void/.cache/pypoetry/artifacts/e5/3e/a7/b69534d16cae11353f6db73f0fd62d7fc874f1640bd9d39fcc878d355e/scipy-1.6.1-cp37-cp37m-manylinux1_x86_64.whl Send2Trash @ file:///home/void/.cache/pypoetry/artifacts/65/5c/cf/74efc7119c07b06a3e5f3f1e4ffc62bd28315db7decee27e42ff7f5ee0/Send2Trash-1.5.0-py3-none-any.whl sentencepiece @ file:///home/void/.cache/pypoetry/artifacts/92/8f/92/0b4cb42c5fec658fb16785c669dd7eb4dbe925d8da8d6b0c12e2a151d9/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl sentry-sdk @ file:///home/void/.cache/pypoetry/artifacts/ec/58/9c/e73bd625efad4210888b269864ba3641df4516a9582f6722cf05ca0ac4/sentry_sdk-1.0.0-py2.py3-none-any.whl shortuuid @ file:///home/void/.cache/pypoetry/artifacts/83/75/5a/5955701463bbd5516b72ab60b68f3c5d7a4b513b2a7dddb58a96d6d071/shortuuid-1.0.1-py3-none-any.whl siamenc==0.1.0 six @ file:///home/void/.cache/pypoetry/artifacts/e3/96/48/99c14ba5c6276fbf4dc2e216553590ec97b52e685863f73e39550418c5/six-1.15.0-py2.py3-none-any.whl smart-open @ file:///home/void/.cache/pypoetry/artifacts/0f/33/80/1b361e7af0ed7288b20ed6c741e374857e5e832f84e73781e4226ba661/smart_open-3.0.0.tar.gz smmap @ file:///home/void/.cache/pypoetry/artifacts/65/5b/7b/613313a5462b9286d173319b5c72030d91a09050eb726ee603f33ef524/smmap-4.0.0-py2.py3-none-any.whl snowballstemmer @ file:///home/void/.cache/pypoetry/artifacts/1b/e7/37/c1ccd5c7451e2f738886b093508276afa6237726b3b8c391d26c98ddee/snowballstemmer-2.1.0-py2.py3-none-any.whl sortedcontainers @ file:///home/void/.cache/pypoetry/artifacts/f5/2e/d6/fb0cc783ac71136be1df7a9851a4f60077d0e2b8e662ce731e3a3c346c/sortedcontainers-2.3.0-py2.py3-none-any.whl soupsieve @ file:///home/void/.cache/pypoetry/artifacts/6c/8d/4c/47458f64b200cf946383fd0d3e5498170178ef711558d4ef50c5f1e951/soupsieve-2.2.1-py3-none-any.whl spacy @ file:///home/void/.cache/pypoetry/artifacts/a7/4c/c1/278f97aabcf79dd734eac17a3588b6bc6c8621e050c173e69067786902/spacy-3.0.5-cp37-cp37m-manylinux2014_x86_64.whl spacy-legacy @ file:///home/void/.cache/pypoetry/artifacts/f0/d3/d6/eca1889d307c0c12efde065b6fb3bcfb25a5c74d153c26626e11518bc4/spacy_legacy-3.0.2-py2.py3-none-any.whl Sphinx @ file:///home/void/.cache/pypoetry/artifacts/48/f8/fa/28d5d3671759e44d1cde56947b9eee6b9007ec92bf008e9652aaa70220/Sphinx-2.4.4-py3-none-any.whl sphinx-autodoc-typehints @ file:///home/void/.cache/pypoetry/artifacts/c2/1b/5c/4c527daa01c9c515303fd7dae1c2fc68dd2a925a9256502d302ae32e6f/sphinx_autodoc_typehints-1.10.3-py3-none-any.whl sphinxcontrib-applehelp @ file:///home/void/.cache/pypoetry/artifacts/cc/cf/dd/8ba7c4afe2ff84e7204886e2fd837495627b3827a21925726fcc3125d1/sphinxcontrib_applehelp-1.0.2-py2.py3-none-any.whl sphinxcontrib-devhelp @ file:///home/void/.cache/pypoetry/artifacts/59/da/dc/900e02cc5452883e989929e3784c1a7094cd7326af3cf4a6d8bb225055/sphinxcontrib_devhelp-1.0.2-py2.py3-none-any.whl sphinxcontrib-htmlhelp @ file:///home/void/.cache/pypoetry/artifacts/11/b9/f6/329734fa1be1c805096ac23381f177cd4cc617499095ab385fce82bd95/sphinxcontrib_htmlhelp-1.0.3-py2.py3-none-any.whl sphinxcontrib-jsmath @ file:///home/void/.cache/pypoetry/artifacts/7f/d8/ef/31320102fc49e5beeb72d480f0dab2dc5429fbd8cedf45817e2320d58e/sphinxcontrib_jsmath-1.0.1-py2.py3-none-any.whl sphinxcontrib-qthelp @ file:///home/void/.cache/pypoetry/artifacts/81/5b/46/baf9bd9c58b789d2ff445165e5859021ea31139602fffcc773c39da86b/sphinxcontrib_qthelp-1.0.3-py2.py3-none-any.whl sphinxcontrib-serializinghtml @ file:///home/void/.cache/pypoetry/artifacts/6a/e9/75/0afa870282075cec7b4aea1f133464212c82722b3cb72319d359cae77f/sphinxcontrib_serializinghtml-1.1.4-py2.py3-none-any.whl SQLAlchemy @ file:///home/void/.cache/pypoetry/artifacts/7f/42/44/655528faa76d83aec61bdced2f43e6a00416851a6fb609d9b2f42b77e4/SQLAlchemy-1.4.8-cp37-cp37m-manylinux2014_x86_64.whl srsly @ file:///home/void/.cache/pypoetry/artifacts/7f/0e/05/c232e06f54f2c934727088bb73fbdabb194b37a53cc6fdfa3e67150180/srsly-2.4.1-cp37-cp37m-manylinux2014_x86_64.whl stevedore @ file:///home/void/.cache/pypoetry/artifacts/64/3f/af/b25f40ac10bc9dafd39a798374d310acd1a62b0e9c3a432f7862a1de9e/stevedore-3.3.0-py3-none-any.whl subprocess32 @ file:///home/void/.cache/pypoetry/artifacts/b6/a9/6e/74893ca81a4fe350238b027043f45690982bcbe3eb5edad507333c2de9/subprocess32-3.5.4.tar.gz tensorboardX @ file:///home/void/.cache/pypoetry/artifacts/59/da/a0/4ecae9c6e8f53733e1441997534f8008834bf3aaf5d7c6bfa9271f953c/tensorboardX-2.2-py2.py3-none-any.whl terminado @ file:///home/void/.cache/pypoetry/artifacts/d7/3f/73/57ecce9aba58022bc076bf9c923dc98b8fa13c2ed9fa7aad4c9e8231a6/terminado-0.9.4-py3-none-any.whl testfixtures @ file:///home/void/.cache/pypoetry/artifacts/b2/d7/f8/e32b667c1a1326308364cc8415842331f55cee16d4bf84aeb7b00da260/testfixtures-6.17.1-py2.py3-none-any.whl testpath @ file:///home/void/.cache/pypoetry/artifacts/20/41/61/40209212e3cc4ead6910ce1a83532987e232f01564d9e48fd1c1e2e12e/testpath-0.4.4-py2.py3-none-any.whl text-unidecode @ file:///home/void/.cache/pypoetry/artifacts/71/f3/8c/83d57454c286b52f0f4545f78a5b77a274ee5bed5a42bc456e99c86023/text_unidecode-1.3-py2.py3-none-any.whl thinc @ file:///home/void/.cache/pypoetry/artifacts/9f/dc/7b/e47192857a9ad48df1d6570eced12ec83753ae1a6f4302d7dc23c0b96d/thinc-8.0.2-cp37-cp37m-manylinux2014_x86_64.whl threadpoolctl @ file:///home/void/.cache/pypoetry/artifacts/10/d2/ec/b7f7827e6b16466e651b5a2ea18c64362ff41a15e221a88cc3064fb4dd/threadpoolctl-2.1.0-py3-none-any.whl tokenizers @ file:///home/void/.cache/pypoetry/artifacts/42/ff/97/038f38b3b1ad8266412d2e22edbba67fbc90e5ef71635c466a9117069a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl toml @ file:///home/void/.cache/pypoetry/artifacts/ee/b4/26/b53b77a5db04c373edfec4046f6e15ab8a6dfbbaee30dcacd447b71c50/toml-0.10.0-py2.py3-none-any.whl tomlkit @ file:///home/void/.cache/pypoetry/artifacts/08/92/ad/3abffc10fb9db6842e047f90088ce25422c9eec5bf89f072620174120f/tomlkit-0.7.0-py2.py3-none-any.whl torch @ file:///home/void/.cache/pypoetry/artifacts/21/c1/64/b18b0b42910be9b56e39d47eca39e249559304dcb7f7f0611b494df2d1/torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl torchvision @ file:///home/void/.cache/pypoetry/artifacts/09/3f/7e/c68656cce106803dad58bbe37993ad8c8549aa2362536e8ef70825dc00/torchvision-0.9.1-cp37-cp37m-manylinux1_x86_64.whl tornado @ file:///home/void/.cache/pypoetry/artifacts/73/ac/41/70b315914d448001a26418a2337c0a55911086b3a31a87013f55e807eb/tornado-6.1-cp37-cp37m-manylinux2010_x86_64.whl tqdm @ file:///home/void/.cache/pypoetry/artifacts/06/df/97/93c62ddbda15a68b2cb12a43b5f6a574d87f007c0605bccab122f5912e/tqdm-4.60.0-py2.py3-none-any.whl traitlets @ file:///home/void/.cache/pypoetry/artifacts/25/0f/c0/2c71bbe86bec170e00a2ee9ef2af2df7b4504fbaf1e8e717622fd1d5b4/traitlets-5.0.5-py3-none-any.whl transformers @ file:///home/void/.cache/pypoetry/artifacts/c0/1e/f0/c6485ae2555a0983158477ce44f6c9d410e65c47eebb2bda0380870055/transformers-4.5.1-py3-none-any.whl typed-ast @ file:///home/void/.cache/pypoetry/artifacts/fe/7c/53/86cd82215775e7707b6f7cafaeed6802f93f0856a8371b0e15bc85d3c6/typed_ast-1.4.3-cp37-cp37m-manylinux1_x86_64.whl typer @ file:///home/void/.cache/pypoetry/artifacts/a1/4c/87/b93e3198d1bd31e5566da8c5edc193d36137b6801bb5cf651c4814fc13/typer-0.3.2-py3-none-any.whl typing-extensions @ file:///home/void/.cache/pypoetry/artifacts/5a/dd/8f/5dc09cb3732cb0be9ecae5854eaa6aa0d4cd95752163c65283ecf9bd34/typing_extensions-3.7.4.3-py3-none-any.whl urllib3 @ file:///home/void/.cache/pypoetry/artifacts/54/b1/a1/ccbf6b869ccdaff965957abdd8e3e5aa4bee1533ed104a0d11bdc07a61/urllib3-1.26.4-py2.py3-none-any.whl wandb @ file:///home/void/.cache/pypoetry/artifacts/58/4a/7f/dec1795f5dd94d975be86acc8a335009da474696dee5d7ffd13bc93a5c/wandb-0.10.26-py2.py3-none-any.whl wasabi @ file:///home/void/.cache/pypoetry/artifacts/e9/07/cd/2f2259f00529ab2503644597edaa1bb14539b96fa7db82c95d03fce7e1/wasabi-0.8.2-py3-none-any.whl wcwidth @ file:///home/void/.cache/pypoetry/artifacts/92/12/86/71fde978823bd982c22bd549b0ba688e372403269396c892ac8160f4fe/wcwidth-0.2.5-py2.py3-none-any.whl webencodings @ file:///home/void/.cache/pypoetry/artifacts/8e/39/d4/1735c959b3d85bebf80692957fe8ad83a2cb27de46bb08a6ababe12c44/webencodings-0.5.1-py2.py3-none-any.whl wemake-python-styleguide @ file:///home/void/.cache/pypoetry/artifacts/6e/95/41/d3af6c762397b478511953f9cea56887cffef04cd12516ff51bcfe3ad3/wemake_python_styleguide-0.14.1-py3-none-any.whl widgetsnbextension @ file:///home/void/.cache/pypoetry/artifacts/df/8c/62/03b8d5e9a4adf6311653006eb285482908f2bf68cc88ba5c00ddc0df1c/widgetsnbextension-3.5.1-py2.py3-none-any.whl zipp @ file:///home/void/.cache/pypoetry/artifacts/16/0e/ae/e94ff238fe5d8b11b4eeb35f0ee94fd9f1b0d2182a3d47e3253ce47360/zipp-3.4.1-py3-none-any.whl ```

Steps to reproduce

  1. Use configuration with multiprocess data_loader with num_workers > 0 and multiple GPUs
  2. Use sharded dataset_reader with instances containing text fields.

Here is the configuration I am using.

Configuration:

``` local model_name = "models/distilroberta-base-msmarco-v2/"; local num_gpus = 8; local data_base_url = "data/SWPt512/"; local model = "siamese_retrieval"; local base_dataset_reader = { "type": "retrieval", "query_tokenizer": { "type": "pretrained_transformer", "model_name": model_name, "max_length": 500, }, "query_token_indexers": { "tokens": { "type": "pretrained_transformer", "model_name": model_name, "namespace": "tokens" } }, }; { "train_data_path": data_base_url + "valid/*.tsv", "validation_data_path": data_base_url + "valid/2445n7jblu53lkvipo2gmm5ooq.tsv", "dataset_reader": { "type": "sharded", "base_reader": base_dataset_reader, }, "validation_dataset_reader": base_dataset_reader, 'model': { 'type': model, 'transformer_model': model_name, }, "data_loader": { "type": "multiprocess", "batch_size": 96, "shuffle": true, "num_workers": 4, }, "validation_data_loader": { "type": "multiprocess", "batch_size": 96, "shuffle": false, "num_workers": 0, }, "distributed": { "cuda_devices": if num_gpus > 1 then std.range(0, num_gpus - 1) else 0, }, "trainer": { "num_epochs": 10, "optimizer": { "type": "huggingface_adamw", "lr": 3e-5, "betas": [0.9, 0.999], "eps": 1e-8, "correct_bias": true }, "learning_rate_scheduler": { "type": "polynomial_decay", }, "use_amp": true, "grad_norm": 1.0, "validation_metric": "+rec5", "patience": 3, } } ```

Also, here is the dataset_reader I am using.

Dataset Reader:

``` # -*- coding: utf-8 -*- import csv import logging from typing import Dict, Optional from allennlp.common.checks import ConfigurationError from allennlp.common.file_utils import cached_path from allennlp.data.dataset_readers.dataset_reader import DatasetReader from allennlp.data.fields import TextField from allennlp.data.instance import Instance from allennlp.data.token_indexers import TokenIndexer from allennlp.data.tokenizers import Tokenizer from overrides import overrides logger = logging.getLogger(__name__) @DatasetReader.register('retrieval') class RetrievalDatasetReader(DatasetReader): r"""Retrieval Dataset Reader. Read a tsv file containing paired sequences, and create a dataset suitable for a `Retrieval` model, or any model with a matching API. Expected format for each input line: The output of `read` is a list of `Instance` s with the fields: query_tokens : `TextField` and document_tokens : `TextField` `START_SYMBOL` and `END_SYMBOL` tokens are added to the query and document sequences. Args: query_tokenizer : `Tokenizer`, Tokenizer to use to split the input sequences into words or other kinds of tokens. Defaults to `SpacyTokenizer()`. query_token_indexers : `Dict[str, TokenIndexer]`, Indexers used to define input (query side) token representations. Defaults to `{"tokens": SingleIdTokenIndexer()}`. document_tokenizer : `Tokenizer`, optional Tokenizer to use to split the output sequences (during training) into words or other kinds of tokens. Defaults to `query_tokenizer`. document_token_indexers : `Dict[str, TokenIndexer]`, optional Indexers used to define output (document side) token representations. Defaults to `query_token_indexers`. delimiter : `str`, (optional, default="\t") Set delimiter for tsv/csv file. quoting : `int`, (optional, default=`csv.QUOTE_MINIMAL`) Quoting to use for csv reader. """ def __init__( # noqa: WPS211 self, query_tokenizer: Tokenizer, query_token_indexers: Dict[str, TokenIndexer], document_tokenizer: Optional[Tokenizer] = None, document_token_indexers: Optional[Dict[str, TokenIndexer]] = None, delimiter: str = '\t', query_max_tokens: Optional[int] = None, document_max_tokens: Optional[int] = None, quoting: int = csv.QUOTE_MINIMAL, **kwargs, ) -> None: """Initialize the dataset reader.""" super().__init__(**kwargs) self._query_tokenizer = query_tokenizer self._query_token_indexers = query_token_indexers self._document_tokenizer = document_tokenizer or self._query_tokenizer self._document_token_indexers = document_token_indexers or self._query_token_indexers self._delimiter = delimiter self._query_max_tokens = query_max_tokens self._document_max_tokens = document_max_tokens self._query_max_exceeded = 0 self._document_max_exceeded = 0 self.quoting = quoting @overrides def text_to_instance( self, query_string: str, document_string: str, ) -> Instance: """Convert query and document string to instances.""" tokenized_query = self._query_tokenizer.tokenize(query_string) if self._query_max_tokens and len(tokenized_query) > self._query_max_tokens: self._query_max_exceeded += 1 tokenized_query = tokenized_query[: self._query_max_tokens] query_field = TextField(tokenized_query) tokenized_document = self._document_tokenizer.tokenize(document_string) if self._document_max_tokens and len(tokenized_document) > self._document_max_tokens: self._document_max_exceeded += 1 tokenized_document = tokenized_document[: self._document_max_tokens] document_field = TextField(tokenized_document) return Instance({'query_tokens': query_field, 'document_tokens': document_field}) @overrides def apply_token_indexers(self, instance: Instance): """Apply the token indexers.""" query_field: TextField = instance.fields['query_tokens'] # type: ignore document_field: TextField = instance.fields['document_tokens'] # type: ignore query_field._token_indexers = self._query_token_indexers document_field._token_indexers = self._document_token_indexers @overrides def _read(self, file_path: str): # noqa: WPS231 # Reset exceeded counts self._query_max_exceeded = 0 self._document_max_exceeded = 0 with open(cached_path(file_path), 'r') as data_file: logger.info('Reading instances from lines in file at: {0}'.format(file_path)) reader = csv.reader(data_file, delimiter=self._delimiter, quoting=self.quoting) for line_num, row in enumerate(reader): if len(row) != 2: raise ConfigurationError( 'Invalid line format: {0} (line number {1})'.format(row, line_num + 1), ) query_sequence, document_sequence = row if not (query_sequence and document_sequence): continue yield self.text_to_instance(query_sequence, document_sequence) truncation_msg = 'In {0} instances, the {1} exceeded the max limit {2} and were truncated.' if self._query_max_tokens and self._query_max_exceeded: logger.info( truncation_msg.format( self._query_max_exceeded, 'query tokens', self._query_max_tokens, ), ) if self._document_max_tokens and self._document_max_exceeded: logger.info( truncation_msg.format( self._document_max_exceeded, 'document tokens', self._document_max_tokens, ), ) ```

vikigenius commented 3 years ago

Additionally it seems like the Sharded Dataset Reader has no apply_token_indexers method, I manually implemented it myself to see if that was the issue, but that didn't fix the error either.

def apply_token_indexers(self, instance: Instance) -> None:
    self.reader.apply_token_indexers(instance)
epwalsh commented 3 years ago

Hi @vikigenius, https://github.com/allenai/allennlp/pull/5134 should fix. Can you confirm?

vikigenius commented 3 years ago

Thanks @epwalsh I will try to run this tomorrow and get back to you. Just a quick question regarding the multiprocessing data loader, how does it interact with distributed training, the docs are not clear on this?

If I have 8 gpus and do distributed training and set num_workers=8, does that mean I have 64 workers in total, 8 per GPU ?

vikigenius commented 3 years ago

Ok, I just checked the fix works. Thanks.

epwalsh commented 3 years ago

Great!