allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

Check for bad start or end symbol in Seq2SeqDatasetReader considers only source tokenizer #5451

Closed JohnGiorgi closed 2 years ago

JohnGiorgi commented 3 years ago

Checklist

Description

In Seq2SeqDatasetReader, there is a check for the validity of the start_symbol and end_symbol (copy pasted below)

        if (
            source_add_start_token
            or source_add_end_token
            or target_add_start_token
            or target_add_end_token
        ):
            # Check that the tokenizer correctly appends the start and end tokens to
            # the sequence without splitting them.
            tokens = self._source_tokenizer.tokenize(start_symbol + " " + end_symbol)
            err_msg = (
                f"Bad start or end symbol ('{start_symbol}', '{end_symbol}') "
                f"for tokenizer {self._source_tokenizer}"
            )
            try:
                start_token, end_token = tokens[0], tokens[-1]
            except IndexError:
                raise ValueError(err_msg)
            if start_token.text != start_symbol or end_token.text != end_symbol:
                raise ValueError(err_msg)

            self._start_token = start_token
            self._end_token = end_token

I think the if statement considering only the source_tokenizer is a bug, as it will raise an error for any source_tokenizer that 1) Adds its own special tokens to the beginning or end of a sequence and/or 2) Uses subword tokenization and splits up start_symbol or end_symbol. This error is raised even if you properly set up your target_tokenizer, and set source_add_start_token=False, source_add_end_token=False.

from allennlp.data.tokenizers import PretrainedTransformerTokenizer
from allennlp_models.generation import Seq2SeqDatasetReader
from allennlp.common.util import START_SYMBOL, END_SYMBOL

source_tokenizer = PretrainedTransformerTokenizer("bert-base-uncased", add_special_tokens=True)

# Set up a target tokenizer so it is compatible with `start_symbol` and `end_symbol`
tokenizer_kwargs = {"additional_special_tokens": [START_SYMBOL, END_SYMBOL]}
target_tokenizer = PretrainedTransformerTokenizer("bert-base-uncased", add_special_tokens=False, tokenizer_kwargs=tokenizer_kwargs)

# Raises ValueError
reader = Seq2SeqDatasetReader(
    source_tokenizer=source_tokenizer,
    target_tokenizer=target_tokenizer,
    source_add_start_token=False,
    source_add_end_token=False,
    start_symbol=START_SYMBOL,
    end_symbol=END_SYMBOL
)

I think the fix is to break the if statement up into two checks, one for the source_tokenizer and one for the target_tokenizer. I'd be happy to do that in a PR if the maintainers agree.

Python traceback:

```bash ~/Library/Caches/pypoetry/virtualenvs/seq2rel-KdBYT5RF-py3.8/lib/python3.8/site-packages/allennlp_models/generation/dataset_readers/seq2seq.py in __init__(self, source_tokenizer, target_tokenizer, source_token_indexers, target_token_indexers, source_add_start_token, source_add_end_token, target_add_start_token, target_add_end_token, start_symbol, end_symbol, delimiter, source_max_tokens, target_max_tokens, quoting, **kwargs) 116 raise ValueError(err_msg) 117 if start_token.text != start_symbol or end_token.text != end_symbol: --> 118 raise ValueError(err_msg) 119 120 self._start_token = start_token ValueError: Bad start or end symbol ('@start@', '@end@') for tokenizer ```

Related issues or possible duplicates

Environment

OS: MacOS, Linux

Python version: 3.8.5

Output of pip freeze:

``` aiohttp @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/1e/2e/7a/374792df594d4558e7849f4a367e0ddb5f5213e6d4776f883323432364/aiohttp-3.7.4.post0-cp38-cp38-macosx_10_14_x86_64.whl allennlp @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d1/0e/28/6187f1a8cf1e88c909d1894c3f99fbe1a396202e4e83c12c2c17158e50/allennlp-2.7.0-py3-none-any.whl allennlp-models @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d3/de/4b/afa5ee582d4a14ab6a4b6fd7c245717963fbe5d7c02f39508b42d592e1/allennlp_models-2.7.0-py3-none-any.whl anyio==3.3.1 appnope @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/db/10/31/24b2c5bc1fc158d51095265fd31ede89b3ac384a8a2307ffc7ec8d87bb/appnope-0.1.2-py2.py3-none-any.whl argon2-cffi @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/cc/73/82/88105415573a251b1cfc3374c63f06527c2f0b56215fdf697b8bb7e92d/argon2_cffi-21.1.0-cp35-abi3-macosx_10_14_x86_64.whl async-timeout @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/0d/5d/3e/630122e534c1b25e36c3142597c4b0b2e9d3f2e0a9cea9f10ac219f9a7/async_timeout-3.0.1-py3-none-any.whl attrs @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/1a/aa/39/10d6d07084f186f8cf6963cb033440402ad5088bb94d712239170f2ef6/attrs-21.2.0-py2.py3-none-any.whl Babel==2.9.1 backcall @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/62/2c/f2/bf9c43ca0bcfca41150901227b0d023dc854851b710f82a72f5beaa09b/backcall-0.2.0-py2.py3-none-any.whl backports.csv @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/62/f6/b5/1fc08ffc7d061d06727f3d82267fa2e6a2ba386b0d45b781011a9e8e76/backports.csv-1.0.7-py2.py3-none-any.whl backports.entry-points-selectable==1.1.0 base58 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/3b/99/5b/f0cba442306d941d06e57b14367dd39889e6045b046248b683ef55f449/base58-2.1.0-py3-none-any.whl beautifulsoup4 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/6b/d1/6b/d0b3c51ce4f18fb1bd802534ae47547c8cac66e8dd8e35ec5dfa25ec74/beautifulsoup4-4.10.0-py3-none-any.whl black @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/98/73/04/994acfc78d741658bc3b00f289e2d6ddf3d1afa901301102bbea18b1a7/black-21.9b0-py3-none-any.whl bleach @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c2/d4/65/1df2e21979f2ef4b2f78653a465a302f5138c7f42c9b2f816d0a5167f4/bleach-4.1.0-py2.py3-none-any.whl blis @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f1/f7/65/c23408461bd53cfeb5b8e761ce18ff00479dd028bbb6a4d8369ec565de/blis-0.7.4-cp38-cp38-macosx_10_9_x86_64.whl boto3 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a8/bb/a3/f0b0ff64f04dd919065a8484ad6b61d9aba2e49a0d0af6787dacecf32d/boto3-1.18.63-py3-none-any.whl botocore @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/23/f5/07/0891b039af91379e2e15d8501de8bd8056423ee805ad859ae578162e1d/botocore-1.21.63-py3-none-any.whl cachetools @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ca/68/fb/a94522dd8c8d081803f16d6ec74ce9a678a827247431e2a8339f25506f/cachetools-4.2.4-py3-none-any.whl catalogue @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/0b/b7/5d/989fb530709f23308e5e3078336a2b25616e48483321f5d40d43227a3e/catalogue-2.0.6-py3-none-any.whl certifi @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/77/74/3e/daf57610ff13227e3b3f50a78487055f6ac94cc9363abd2463be34c5e9/certifi-2021.10.8-py2.py3-none-any.whl cffi @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/e6/3a/ed/211634ffd1f9ed0a6957a6981ef59f3af9934e62dc2ee15819d9623196/cffi-1.15.0-cp38-cp38-macosx_10_9_x86_64.whl chardet @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/47/b7/82/19c2b887f87f3adbaf4e34c55189388e5132c78f6929d7001a78b0209b/chardet-4.0.0-py2.py3-none-any.whl charset-normalizer @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/23/43/96/9c5bffd9765f723eb96925cec608e36125667e8f31b6b3152f0b51716f/charset_normalizer-2.0.7-py3-none-any.whl checklist @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/7f/33/3a/65d35388a6f64f3588edd653207d54bc2a6df87d12dbbd091a60691030/checklist-0.0.11.tar.gz cheroot @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c7/ed/3f/4d422858bbc847c271a76ff4f9f74fa52cef8e3b343137c8dd3ee20c74/cheroot-8.5.2-py2.py3-none-any.whl CherryPy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/73/28/88/5a7eb306865e78d39e014c1a5817d8982dbb3bdd86c3d2d24d6e4a3c59/CherryPy-18.6.1-py2.py3-none-any.whl click @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/3c/f6/a3/2262df8f5a6f3de5bbc78cc13803c60524c8384dce76661fdfe3df975f/click-8.0.3-py3-none-any.whl codecov @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/29/78/32/d6897550a321cf7a72f75d517bdd629791fb6d77a845d22f38ae663b98/codecov-2.1.12-py2.py3-none-any.whl colorama @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b0/f3/a3/cf94f06cbe1d286a25116cfe54d5a75cb1c4b54d15b2b1b4fc03a7f657/colorama-0.4.4-py2.py3-none-any.whl configparser @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f7/52/b3/2a0e73f491afdef69d3a59b416ca8c10b4e43a9eee55e95065f5f145e8/configparser-5.0.2-py3-none-any.whl conllu @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/8e/76/29/59f80fa3e3e5231b9674ebc6d9823c24ca79e647c82407a704f7c6522f/conllu-4.4.1-py2.py3-none-any.whl coverage @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/7d/97/ed/264891013579a2b2e008cef6ad91712d255bfcdf01af6d130383d8270b/coverage-6.0.2-cp38-cp38-macosx_10_9_x86_64.whl cryptography @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/16/18/fd/53f7bd4088758821f0edf9ef9f345668e11a1f1894f1b4dc3f7252a801/cryptography-35.0.0-cp36-abi3-macosx_10_10_x86_64.whl cycler==0.10.0 cymem @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f8/37/7b/9022833f73b73d3b67107a0aa0567e2d8bf526e43b6c211ca8e2d714e5/cymem-2.0.5-cp38-cp38-macosx_10_9_x86_64.whl datasets @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/1a/f3/4e/f4d95860411c32ad68b41e571d677d898efd8f8862c1904bdf54678aa2/datasets-1.13.3-py3-none-any.whl debugpy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/57/26/42/875e0a918207d1e49c1619f784d21d1d4bfa1650b39282eea154cd8398/debugpy-1.5.0-cp38-cp38-macosx_10_15_x86_64.whl decorator @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/34/17/0e/2b47ba09c969bda1d9eaefda2d9ccee318289f9e57d361761be4f3ab90/decorator-5.1.0-py3-none-any.whl defusedxml @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/15/b0/94/888992ee7b1de9fbc975b6afd17d34f441df6e96172df8d4eba95c9432/defusedxml-0.7.1-py2.py3-none-any.whl dill @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/28/60/d3/32828bed02c4d7c32349419de5acbc87826c39d627804affa9a357ca4b/dill-0.3.4-py2.py3-none-any.whl docker-pycreds @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d8/39/1e/e7e74a2508ba56c50439f59cedea0b0b83433255ecfa7626aae74aef70/docker_pycreds-0.4.0-py2.py3-none-any.whl entrypoints @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/27/67/42/5ca7438658f76c8700ff6c44ea1cf9dc128cf0862adb7de53d3a35266c/entrypoints-0.3-py2.py3-none-any.whl fairscale @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/fa/92/f4/a8ff4216b2ef616172ef3ae34a26b8ac2b138172f68bd8512ca98b880a/fairscale-0.4.0.tar.gz feedparser @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/aa/52/0a/5826fdb32a49126bc4790a960987feaadceb6d923f41500a4de5ef2c07/feedparser-6.0.8-py3-none-any.whl filelock @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f4/78/c1/69555c3867649a2a5dac43f12a078830700480a49be273fb2de82be2ab/filelock-3.0.12-py3-none-any.whl flake8 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/93/78/ae/ee75d1b46827f8892defc2a710979cc71803d2da75a049bdabd3adad70/flake8-3.9.2-py2.py3-none-any.whl fsspec @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/34/60/72/3b024eeebe678be8f6a267950b5b0fb1e649e1d087162576f3dd6c2b6c/fsspec-2021.10.1-py3-none-any.whl ftfy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f6/d1/2c/db46cdd19a4799ff0bc2fb1cfcd8912d7be819f91ae38cccbcd7885cca/ftfy-6.0.3.tar.gz future @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d9/62/dc/809bb3ddfe360ddc60ebb287ad6b0eaf71aef98937f0ea0c466d44aa19/future-0.18.2.tar.gz gitdb @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/04/0e/cf/315c241493e86260bce0499c9a41b201c16d4c408d14259d1fb78f2dba/gitdb-4.0.7-py3-none-any.whl GitPython @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/65/50/fc/efadf2a96d18ae3443681e0d98d082d735e62ebb53f1f46312a4e6ca57/GitPython-3.1.24-py3-none-any.whl google-api-core @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/df/19/49/576fc9447a455a79e3a9a57cba6435b47f9a146b9429c73b57dbf80faf/google_api_core-2.1.1-py2.py3-none-any.whl google-auth @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/13/4d/3e/c4dc56fe3580ddc0be70ad157f7b63f7bd4184270d3dcc741a3e7032c6/google_auth-2.3.0-py2.py3-none-any.whl google-cloud-core @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/35/2e/bd/b38f62629276e20104e18e7d27e93d6673f68e9c3d27d48c6b470c9321/google_cloud_core-2.1.0-py2.py3-none-any.whl google-cloud-storage @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/55/a3/4b/c0107363aa1a1b566b15d06baf5ccd9eb820229d58aa247d6f5e2b79b3/google_cloud_storage-1.42.3-py2.py3-none-any.whl google-crc32c @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/37/48/05/a1631635edda209ce80b9ecb6e0678262bf3a0f30fde8674fc1bd3245e/google_crc32c-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl google-resumable-media @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/2d/c8/79/2f89c7a2afafab389d62c8e123c9c083e982e078a5a05d04e7967b5e6e/google_resumable_media-2.0.3-py2.py3-none-any.whl googleapis-common-protos @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/8c/2e/ad/efdce99c62dbe56fda156dd62f807ea79a43f83dea45bd9f9ff4bc6240/googleapis_common_protos-1.53.0-py2.py3-none-any.whl h5py @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/be/59/fa/8583e2f8bb9d66efdfe56f10669c984c3531e4ebce40eb474d2a52754a/h5py-3.4.0-cp38-cp38-macosx_10_9_x86_64.whl huggingface-hub @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/53/9e/4d/9ef09205045d9886788cd70f1f7cf516e41da3691ab8000f381da616dc/huggingface_hub-0.0.19-py3-none-any.whl hypothesis @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/17/1d/e1/a12fe47a0ba064f8e62294e4b29cf5e7d32164b0186cf6092f8a7eabb9/hypothesis-6.23.3-py3-none-any.whl idna @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f7/09/c9/8e80952436ff87855ddff4891a35ecf913b2b5dc97911f02a070ff2b1f/idna-3.3-py3-none-any.whl iniconfig @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ac/f9/da/e990ffcd9ec361a68676a5916e391286e1ea5d1b8907ae887e141a71f5/iniconfig-1.1.1-py2.py3-none-any.whl ipykernel @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f4/a2/97/0b967c5d149ef3cbde87b844ad05b0391783a9af93b1eb947f872ae38d/ipykernel-6.4.1-py3-none-any.whl ipython @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f7/92/5e/e887b7585d67c295cb815d828bc8892c281feb8826447b699fac96d520/ipython-7.28.0-py3-none-any.whl ipython-genutils @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/e6/8e/a3/8f37e14310c0072b3fcc4240490bcb42630aa695d069aee89953ebd9f8/ipython_genutils-0.2.0-py2.py3-none-any.whl ipywidgets @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/13/b5/43/b938f79239314b91a8160cdd8ff1644168cad0aef195945834a185c620/ipywidgets-7.6.5-py2.py3-none-any.whl iso-639 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/df/a4/b2/6573e801cd5b1443ea8f62ef82096a6a7cb706ff16c480e7811f9f95a9/iso-639-0.4.5.tar.gz jaraco.classes @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/06/2a/9b/c3f7f1dc3409472aac3fa75571ec4f9c5fdf9724ba95aa0d3857e993fd/jaraco.classes-3.2.1-py3-none-any.whl jaraco.collections @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/10/78/fb/1453da786fa6ee3c9d9c620a49cd022239dded61be2c6b5a727260dfad/jaraco.collections-3.4.0-py3-none-any.whl jaraco.functools @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/5e/70/c8/fd61c1b3830aae625b240e7420442742de5b64790eba20ce95be9b2b36/jaraco.functools-3.3.0-py3-none-any.whl jaraco.text @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/51/e4/2a/f0fbc9f217e7c8574db5bca1abf0398d6cfbb62a0ed6c4377a2366d435/jaraco.text-3.5.1-py3-none-any.whl jedi @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/42/26/07/496d2180e241dbf58cf832e0c7a617d8fcbdd6f3f93937056d106545fc/jedi-0.18.0-py2.py3-none-any.whl Jinja2 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/bc/7e/94/a305d7942db6684c4e372c6d8c89d1df206e8c88517a7b5b0dffbbf29e/Jinja2-3.0.2-py3-none-any.whl jmespath @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/43/d5/a2/f83573231324de7f5b61f5c607fbbe82ca535359a452de4852d2e25e8d/jmespath-0.10.0-py2.py3-none-any.whl joblib @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/9b/a3/77/11a58dff170042277b83b2abf4a83664b0ef3c048f0cccee2b204ed842/joblib-1.1.0-py2.py3-none-any.whl json5==0.9.6 jsonnet @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/23/ba/91/41be2f00e06a18e8c220311460dc643bb9975fe5bbf08a3e0017e887fb/jsonnet-0.17.0.tar.gz jsonpickle @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a3/43/36/9eb7c32fbeffd93e855d2090f208d1bb0660f56aa2e6d0a7234c3faae2/jsonpickle-2.0.0-py2.py3-none-any.whl jsonschema @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/33/bd/57/7f6d7f5f991bc085a9fddc45d80bad9314919b6ae7f0da38dea4f739e3/jsonschema-4.1.0-py3-none-any.whl jupyter @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/2e/44/a0/764f7d3907f220eb94db0e2bce1f8f3e50dcb48aca15a625dd210cb114/jupyter-1.0.0-py2.py3-none-any.whl jupyter-client @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/cd/1f/33/dafe8bc6a70871d969c9f5c4539659af33f71e0067aef55315b491caa8/jupyter_client-7.0.6-py3-none-any.whl jupyter-console @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a4/52/39/18f8e5a28b25875a9251e900d0d749e9c28310b3ce786ba4ecd67c2e45/jupyter_console-6.4.0-py3-none-any.whl jupyter-core @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/21/d9/04/1f31168a9142ce3d567d44960411c87f47a2fdcebc083307f80d6e151b/jupyter_core-4.8.1-py3-none-any.whl jupyter-server==1.11.0 jupyterlab==3.1.13 jupyterlab-pygments @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c8/b0/10/ad75ecf240424057a12f5b4320da2c9f380541cdf830a39e7e8437c2c0/jupyterlab_pygments-0.1.2-py2.py3-none-any.whl jupyterlab-server==2.8.1 jupyterlab-widgets @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/4b/3f/13/7f7e8014e6ea67fab4a942882d870d27a15971f08b0b2660b4cdbcd03f/jupyterlab_widgets-1.0.2-py3-none-any.whl kiwisolver==1.3.2 lmdb @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ab/59/27/94188b63ed5f8e062d4671d2bcde446c8cf73ac892dbdcbe4db746ba79/lmdb-1.2.1-cp38-cp38-macosx_10_14_x86_64.whl lxml @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b0/c3/1f/bde91e88c462e522b16a3b38fab74d4d32aea11d19262a02ea3cbdadd5/lxml-4.6.3-cp38-cp38-macosx_10_9_x86_64.whl MarkupSafe @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/cd/11/54/fd481eec49cddc06876e526483e7ee8675d0910fed2aa7668a9fc88e62/MarkupSafe-2.0.1-cp38-cp38-macosx_10_9_x86_64.whl matplotlib==3.4.3 matplotlib-inline @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/99/9e/a7/697624e74ebe7195097dbec4fe8bff67cf73c04390b69235ab5d6433b3/matplotlib_inline-0.1.3-py3-none-any.whl mccabe @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/37/6e/69/4a33a4d6c80c775b1ee205face2c6e07b762c8602bb0f0d236ebe790c5/mccabe-0.6.1-py2.py3-none-any.whl mistune @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d9/d8/5c/8eac14aaa95c3aa81409d56b7423ff6dd88eb398f551c1bf0b8c05b916/mistune-0.8.4-py2.py3-none-any.whl more-itertools @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/e5/d6/53/7e4b7c402487c7aa47865c0155ca95057fd385fcf3d331fb0535f7bf38/more_itertools-8.10.0-py3-none-any.whl multidict @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ce/07/97/96c8c0778bcefb790a0e0509f8f3114fc02f75a074477cac59c5174797/multidict-5.2.0-cp38-cp38-macosx_10_9_x86_64.whl multiprocess @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ca/0c/50/4fa4087501423c3e6a0e07cf9e4293cb262c98689b0a5da0bcc0ee4148/multiprocess-0.70.12.2-py38-none-any.whl munch @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/8a/51/90/b6d8e8acd78aee2f686f94d1d64b52cc649a194f0b71c3e083bc2c4abf/munch-2.5.0-py2.py3-none-any.whl murmurhash @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/e4/76/e6/55d7b3ff6a76e1d5a66a445a5a76d831adecb7840c89ffa3c847a6f17d/murmurhash-1.0.5-cp38-cp38-macosx_10_9_x86_64.whl mypy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ab/cc/97/d2bf236a86ad0c9b4134b069009b132ec469a131579fe22d5d52e43974/mypy-0.910-cp38-cp38-macosx_10_9_x86_64.whl mypy-extensions @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b6/a0/b0/a5dc9acd6fd12aba308634f21bb7cf0571448f20848797d7ecb327aa12/mypy_extensions-0.4.3-py2.py3-none-any.whl nbclassic==0.3.2 nbclient @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/4a/28/12/0bec26bfe28eff423fdd8b3e3f361ef308eb3a1b77b95c30c126350672/nbclient-0.5.4-py3-none-any.whl nbconvert @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/3d/9f/ac/586455af84e0714a231e09a125e6a20029577b22c8b9d5ae3bf2378c95/nbconvert-6.2.0-py3-none-any.whl nbformat @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/50/21/dd/ed31613a2f0fda01bcb038557b711726da40cbe5b483cf13719918d6b0/nbformat-5.1.3-py3-none-any.whl nest-asyncio @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f5/d1/32/8bc78b4ee2a2d3a377491872fe5aa705448a863af0bb0968676a0e2d3b/nest_asyncio-1.5.1-py3-none-any.whl networkx @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/fa/4f/d3/e7fe0bef76afda6c21516d0f99c5a4f7c81b8aea2c884477bcb331d2f4/networkx-2.6.3-py3-none-any.whl nltk @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/eb/6c/95/719c7a3727a2f3caacddac4c5b5db9e76ff0fc027e56fe4520292c74fd/nltk-3.6.5-py3-none-any.whl notebook @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/98/21/17/b4e544779a6e1a6e8c202d4c60a8382d869181d958754ad37c69ad82f2/notebook-6.4.4-py3-none-any.whl numpy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/da/3c/db/6f799cf4a5e87e3d4e53ac7a1b5a9f6638f4b6cc5122ccf3286d2ac802/numpy-1.21.1-cp38-cp38-macosx_10_9_x86_64.whl overrides @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/2e/a1/58/bef997b290151ed8dc40c8873f5a581732643e73c41da963ab65b75838/overrides-3.1.0.tar.gz packaging @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/6e/de/e5/e3e7e60b359c616435089a1dd11b101dc553fae21fca74bbe98d0c323e/packaging-21.0-py3-none-any.whl pandas @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/01/67/d5/6fde19dd1d9f25e899b24eb3acfecaf25a53aa9c8e76ec8f58595bddfc/pandas-1.3.4-cp38-cp38-macosx_10_9_x86_64.whl pandocfilters @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a0/1a/aa/08b056cf02efdac22abd6440fea5dbb93be4fe8c9c1f33c6d6f8795de7/pandocfilters-1.5.0-py2.py3-none-any.whl parso @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/22/e1/fb/1f6342ec76fc3368aca0b5266c38f9a44c03e1003f904397fd31c0df0c/parso-0.8.2-py2.py3-none-any.whl pathspec @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/04/91/14/294ed2ee6c852b0d466bdd15d393127eff4168b35ae81cedf5a03fe348/pathspec-0.9.0-py2.py3-none-any.whl pathtools @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/db/27/0c/42fa00f22356328dc3bd6da21aae59384ad7f36690fe5ca5495586560f/pathtools-0.1.2.tar.gz pathy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/9e/df/8b/1b7de6047d48d9b82dcf31e899ef9d07e2bf97241956d3acdb62af271c/pathy-0.6.0-py3-none-any.whl patternfork-nosql @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/2a/8a/10/3ffc35b9958e7c2fe26d577c221508d170be8768aa92eb9212d7c8d061/patternfork_nosql-3.6.tar.gz pdfminer.six @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c0/ec/8d/6d1984532bf0bef70e3e34b224a230e18acfa90d8d85068e3f4bf49027/pdfminer.six-20211012-py3-none-any.whl pexpect @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c0/05/08/f23ddb8e3d5b19e7cf01eb434220310be2aaf69226bdec78bc53589024/pexpect-4.8.0-py2.py3-none-any.whl pickleshare @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/27/b2/0a/93a92c700a1993b2923519262ddf76a629bd459a0597c0ae28bf80c7a6/pickleshare-0.7.5-py2.py3-none-any.whl Pillow @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c9/53/2f/e945b6916e89751d135bc88d397e10fd6e5adb24f0cd89e037fa67e8ae/Pillow-8.4.0-cp38-cp38-macosx_10_10_x86_64.whl platformdirs @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c2/ba/64/b6513f98bf6524a0ec2316e6f9326829bf25522e915b9965b0e55b969b/platformdirs-2.4.0-py3-none-any.whl pluggy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/38/05/d7/70e2b0d553c780097b692ed658b0f553a28f16028d848a98174b1a2249/pluggy-1.0.0-py2.py3-none-any.whl portend @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/bc/af/52/69884c4cd721d7775dcc7593f513479170204e31ab750643b8fe80f7fd/portend-3.0.0-py3-none-any.whl preshed @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/87/8c/8d/a9fcfd95b2ab9272af32f4480cbc7c58f297d5ea2df53367579e0a57b4/preshed-3.0.5-cp38-cp38-macosx_10_9_x86_64.whl prometheus-client @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/00/18/42/94cd084834c473ccb4fc3413137856974fb5a4423d3cfe615faf442dd0/prometheus_client-0.11.0-py2.py3-none-any.whl promise @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/0f/4f/91/54827555a9ce3597aca6d8a5ca3066e4cbdeb55dfc09e7dd69e113cec7/promise-2.3.tar.gz prompt-toolkit @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/49/72/f2/5d1b0ad43dfb0689c7ae8a457baa705bacdaec338077a13f807686fefd/prompt_toolkit-3.0.20-py3-none-any.whl protobuf @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ac/85/1a/e8ab87755d8a0d6589f086e98d5dcb1b65995476c00ab95b92fa124c9c/protobuf-3.18.1-cp38-cp38-macosx_10_9_x86_64.whl psutil @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/84/fa/d1/12147def50aecb1f1a65f19fac48db25f0edc692fafb01a6809daf4995/psutil-5.8.0-cp38-cp38-macosx_10_9_x86_64.whl ptyprocess @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b5/84/64/9519e6926ac101cbc8d93423b8165f4abac4f8a8e3e099f74d3c7e0e67/ptyprocess-0.7.0-py2.py3-none-any.whl py @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/6b/b2/2b/e6686e7d0183dbd36bd66921efa3e77ce26260a3671524cd86614290e0/py-1.10.0-py2.py3-none-any.whl py-rouge @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/8c/da/8b/765cd7f7641eb25264c1d00e40e19d69d198c3f838160614aa34ff2fc7/py_rouge-1.1-py3-none-any.whl pyarrow @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/60/9d/e0/55d4f2f56315901aeeed8bedad29454ebbcc11937b1586b0183bbb8a70/pyarrow-5.0.0-cp38-cp38-macosx_10_13_x86_64.whl pyasn1 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a6/f7/5e/59a43ec23ade0888631b3f24244da7f5d5a0b6b40849c86b8c6b4c54d1/pyasn1-0.4.8-py2.py3-none-any.whl pyasn1-modules @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/09/f6/56/0e33158234b3e9b9e5d8eb01e7a96670a99ab43b3dbe89b0f129477a59/pyasn1_modules-0.2.8-py2.py3-none-any.whl pycodestyle @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ea/27/4a/ce8e18f033aae28e47dc2895901dce76e10e7c9efc48bcd95ab4443c47/pycodestyle-2.7.0-py2.py3-none-any.whl pycparser @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f1/03/25/40eb46f7bede64f78ba073e2141b8216e611cbcde72e3117c326560101/pycparser-2.20-py2.py3-none-any.whl pydantic @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d2/99/3c/bffa7ba1ed68effa728f5766fd9b626a4ecf01b7635d3e3ff1c09c6ed8/pydantic-1.8.2-cp38-cp38-macosx_10_9_x86_64.whl pyflakes @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b0/73/41/34d4d02987e40ffa6ee0292425303a75b4178476bf134ceaf0585a9faf/pyflakes-2.3.1-py2.py3-none-any.whl Pygments @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/0f/4e/ff/68a1ec60d16d43c7f126ec7a6834ec8300bf293c7fa5ab34831a7f2211/Pygments-2.10.0-py3-none-any.whl pyparsing @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/da/e7/3d/1780282f558e5fd157bf708b28b8ba0d08323ef6bc5b6396139ce38a0b/pyparsing-2.4.7-py2.py3-none-any.whl pyrsistent @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/72/05/80/43e73cc485a2fd76c8e9ba4d5e41b31ef57e6f90bf82079ad3a49fa0b8/pyrsistent-0.18.0-cp38-cp38-macosx_10_9_x86_64.whl pytest @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/90/e1/d5/985c1b371486cf5749ce1ec1beac04524dacb0642a52d916fd1aa82671/pytest-6.2.5-py3-none-any.whl pytest-cov @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ce/cd/a2/2d987485d9ccd39e0e4c99f44afb2031232743eb1ec04c4ccf1c3d3b44/pytest_cov-3.0.0-py3-none-any.whl python-dateutil @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/0f/f1/e2/e1d3399cea26388e2ed5b93ea3e7c137d2b4027c5ba14c64ab839294ed/python_dateutil-2.8.2-py2.py3-none-any.whl python-docx @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a6/64/06/a932c2eb8f96e71ce6b90c535a1971c0dfa110ee1cc554f3f6f38eb0b0/python-docx-0.8.11.tar.gz pytz @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/07/c1/43/6f22a533e7bc5e277f10b4c6033e936ff7bbeb6464e5f27dbfb6c641d7/pytz-2021.3-py2.py3-none-any.whl pyvis @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f2/11/9b/2a8f163989cf6da31959f8055ae0914f2a4da74425170f2a8af79d5061/pyvis-0.1.9-py3-none-any.whl PyYAML @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f0/ae/ac/3d7d408acba599bb6f26145f4ccfa63e42769a71f67bc8d5b643946261/PyYAML-6.0-cp38-cp38-macosx_10_9_x86_64.whl pyzmq @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d4/18/cd/4b50de2fe36cddc0a2c570dea01dc7a2322ad65edb2340dd2c503656d4/pyzmq-22.3.0-cp38-cp38-macosx_10_9_x86_64.whl qtconsole @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/a4/47/34/7fe2401249a490f591453e6116bfe86c553175b2148660b88324f7e994/qtconsole-5.1.1-py3-none-any.whl QtPy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/bd/f0/f0/53d1c1d12ddbebbe885aa7066c4d551b514f25fa970c1ff6c853dd23bd/QtPy-1.11.2-py2.py3-none-any.whl regex @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/82/f8/be/afa0d92da049895e643ec6adaad8030302f076f3a9231e5bd2b38cb67f/regex-2021.10.8-cp38-cp38-macosx_10_9_x86_64.whl requests @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/97/12/e1/3c2e2b7f315a912e2da1b1465a23c3f14d51a3bd4696e14e0f2796adde/requests-2.26.0-py2.py3-none-any.whl requests-unixsocket==0.2.0 rsa @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/3a/ab/43/a67c8d818350593872a5713ca125b95f465d62eee5ba7895de1194add1/rsa-4.7.2-py3-none-any.whl s3transfer @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c4/3d/ac/f736ad58b798b0dccee9e9ba7ed0493385c74f952b5f04c10dfb563877/s3transfer-0.5.0-py3-none-any.whl sacremoses @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/35/e8/a7/c74242481a266a99136f3fa39a5210170ba5c33052b750843b794f1a60/sacremoses-0.0.46-py3-none-any.whl scikit-learn @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/09/ca/db/46fdde8f503c9a1bd55bdf588b8f18dd36b2eded7d0532f83527ce74f4/scikit_learn-1.0-cp38-cp38-macosx_10_13_x86_64.whl scipy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/98/e6/9a/ec37b4d476e93b73a6ac9e134ac65839185bd9428c87e4913a614f1f80/scipy-1.6.1-cp38-cp38-macosx_10_9_x86_64.whl Send2Trash @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/04/75/07/732af573e27a23b54c8ecd612eadc3725f0ce6428aee277184a4d0c780/Send2Trash-1.8.0-py3-none-any.whl sentencepiece @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c2/54/b6/bdf5712fbf6c8df51afce96a9ef8954dd601ebfce81ea653375a13292f/sentencepiece-0.1.96-cp38-cp38-macosx_10_6_x86_64.whl sentry-sdk @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/aa/f9/f5/e3e2f5afb33f1f7c4f81e0d5a3320138b5da41db1d61659a614dc8da78/sentry_sdk-1.4.3-py2.py3-none-any.whl seq2rel==0.1.0 sgmllib3k @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/7f/ff/47/daedf070ffb84ea157ae769afcd95b6085d6bb88fa9d54b93011cf5f6b/sgmllib3k-1.0.0.tar.gz shellingham @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/4b/83/6d/e35ec68166f5c637275dcae5c8411bbc624f69306d927f7e46b2a3e278/shellingham-1.4.0-py2.py3-none-any.whl shortuuid @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d1/e0/1d/1f9cb7b3ab6fabb17fe7420dd453db16d7ba50a15deb1b234eb320b2b4/shortuuid-1.0.1-py3-none-any.whl six @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/48/e6/04/8118155ae3ec3a16dd2a213bbf7a7d8a62c596b2e90f73a22c896269f1/six-1.16.0-py2.py3-none-any.whl smart-open @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/26/81/76/40a70723fc34dccdf893387d11e40c3dd3a59c46fb6b461944136fbe4b/smart_open-5.2.1-py3-none-any.whl smmap @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/1e/d8/13/34940d459f5ed772e4b4e7b376b8b694f793d766785e9e8700188a8d5f/smmap-4.0.0-py2.py3-none-any.whl sniffio==1.2.0 sortedcontainers @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/8f/31/be/2d0a228ecadbc21e65b90453a4055bb3a48b0b098ed3e0ca8cafa85629/sortedcontainers-2.4.0-py2.py3-none-any.whl soupsieve @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b8/a8/7f/9e40f555a1f9d45c0bee1c0968b363dd999d2beb5b111eec78c11e9203/soupsieve-2.2.1-py3-none-any.whl spacy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/75/15/af/dbec3d052bdbf4584feef13bb6960fb62e72568826b9b6df1f847dc209/spacy-3.1.3-cp38-cp38-macosx_10_9_x86_64.whl spacy-legacy @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c0/27/a2/5a30c4f2acd541e62cba27834a6f07bbe740c8b29f0677e7843ce81ce0/spacy_legacy-3.0.8-py2.py3-none-any.whl sqlitedict @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/cd/26/1c/f4f445a24f194f21b8c96459eaf76398dc434a978d7f07fbd6e3bac1d2/sqlitedict-1.7.0.tar.gz srsly @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/6a/56/a1/06cc4a4e9f2a83dca3c26267804a8d489500fc7778aeb1627d8fee90ae/srsly-2.4.1-cp38-cp38-macosx_10_9_x86_64.whl subprocess32 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/ba/61/77/20064e3312c309728c4824cb725c398e0044f1e1afaa2be5482ea8e0fe/subprocess32-3.5.4.tar.gz tempora @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/4a/7f/bd/9999aec83cbafd80ca0e96379ddf30ede9aeeccce7511279425993092b/tempora-4.1.2-py3-none-any.whl tensorboardX @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/68/eb/3c/1605a551971ad0cea75ef1dcec3eea7add47711fe7a2cb25d38604a46d/tensorboardX-2.4-py2.py3-none-any.whl termcolor @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/33/6b/61/45ca85fe93f3295319a6eb1d6a8d2d449b6fa6d17c2ed2ec1810196a4a/termcolor-1.1.0.tar.gz terminado @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/99/c0/03/31feee159a5291c183f98d53f6b00d169c3e4a21feb6a570ea1b34b066/terminado-0.12.1-py3-none-any.whl testpath @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/99/c6/0c/2256a0fa1dbfdfee5595aa75497e3967a6a234852e7e76050fe51d18ed/testpath-0.5.0-py3-none-any.whl thinc @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/61/0f/1e/29a881f14c47a32cd52823be48b2347e9c236322664988aacc818b5950/thinc-8.0.10-cp38-cp38-macosx_10_9_x86_64.whl threadpoolctl @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/20/bc/46/3ea1d0f54ff5ea38f5af47728f2dc93c2bd79a4ab3fb58f74cec723c35/threadpoolctl-3.0.0-py3-none-any.whl tokenize-rt==4.1.0 tokenizers @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/9b/b5/e5/a3cea2a9e667e964792e0961f23412fcce905661a6f90a4136d2c07637/tokenizers-0.10.3-cp38-cp38-macosx_10_11_x86_64.whl toml @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/9f/8c/c3/7b4f6778d60f3c9fa11f8fd0e48243bbad25a04975e0a01006b6350594/toml-0.10.2-py2.py3-none-any.whl tomli @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/66/79/ff/273065eae5d047b67af1ec99ed8abcb1769f67f1d7ca537d0a07beac40/tomli-1.2.1-py3-none-any.whl torch @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/3d/3a/9b/1511a1faddd2fd32bfbb5e99faff69a1679e0c29098ec6a856666990b3/torch-1.9.1-cp38-none-macosx_10_9_x86_64.whl torchvision @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/0a/f4/97/d909a6605c2d653a4b11c0c0dd4189b55c7d654d65c6788332ef9c07d3/torchvision-0.10.1-cp38-cp38-macosx_10_9_x86_64.whl tornado @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f8/e3/41/917f42a91e332f48e4150ee97f1c569a747bb58d2db08b1aa8a794fe06/tornado-6.1-cp38-cp38-macosx_10_9_x86_64.whl tqdm @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/cc/00/d6/11a230005951f04ebcf5e304ef217666e8e0f5fed12661971a4988bb2d/tqdm-4.62.3-py2.py3-none-any.whl traitlets @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/b6/cc/f0/bf72aeacb5d520e3c0e1cb45914bc5799a08f52788d89a62014f8bf35d/traitlets-5.1.0-py3-none-any.whl transformers @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/e4/9f/e9/b54bc81ee849bd9317d3327485aed385538f982612b48e692d6ad733af/transformers-4.5.1-py3-none-any.whl typer @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/50/84/0e/12de0984fa7c3b0360303fd4df9b8f70ef49467ebc690e78371c7b4681/typer-0.4.0-py3-none-any.whl typing-extensions @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/63/00/66/294e7d75d34398c27c7ed9b154822f99422b0f11548b11c569c188eb2b/typing_extensions-3.10.0.2-py3-none-any.whl urllib3 @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/d8/12/e2/1b499db5e41e88fd0f31d0e056a4dc7b53824f63f027fffc1702969c57/urllib3-1.26.7-py2.py3-none-any.whl validators @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/9b/76/01/4344dc10a31836ed299d8699acd5da84d5bf170de6953be4e58cce86c8/validators-0.18.2-py3-none-any.whl wandb @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/e4/26/83/dc36e5102ad1991ce452baedf1c8103fbbeb2e044b55ee206f6fe4741c/wandb-0.12.4-py2.py3-none-any.whl wasabi @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/42/4a/9d/f75bd98be0123c02fb2a2dbb95d920914b644aced2fc781e2e4111685d/wasabi-0.8.2-py3-none-any.whl wcwidth @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/36/68/e2/7232f431072d5e8aeec124120b9a1d095d45da10311d271fac10982473/wcwidth-0.2.5-py2.py3-none-any.whl webencodings @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/60/1e/b4/eff9915b6506bb01a5ad61dfae3fa4f0302be9e2ad45eaccc833925b95/webencodings-0.5.1-py2.py3-none-any.whl websocket-client==1.2.1 widgetsnbextension @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/4f/c4/8c/95c9c932a9649e98240304b336a4c725419ee2fd517897c94b817722d6/widgetsnbextension-3.5.1-py2.py3-none-any.whl word2number @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/f1/a2/d9/65e48a223ce6054ccc45f9ba049ee4ce8b8000656ddebb233642b52225/word2number-1.1.zip xxhash @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/dc/56/52/d7f0d297596fcfd5794d8b7bc54962a27cedd0b9467ec2b24b83a18230/xxhash-2.0.2-cp38-cp38-macosx_10_9_x86_64.whl yarl @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/44/09/c8/d10accf5175f19f5916c186ccd3f809722d6cc0459e7d47f2dd56c9cce/yarl-1.7.0-cp38-cp38-macosx_10_9_x86_64.whl yaspin @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/c9/f9/85/a7745419a7b786e5bd6b91a6b9bdae7a4612671ecd52c2cd2703ff061b/yaspin-2.1.0-py3-none-any.whl zc.lockfile @ file:///Users/johngiorgi/Library/Caches/pypoetry/artifacts/8a/7d/4d/87319be2a2e6f3fc33151575541b353af4ba6a4ac928bb1d3c1f5d64e1/zc.lockfile-2.0-py2.py3-none-any.whl ```

Steps to reproduce

Example source:

```python from allennlp.data.tokenizers import PretrainedTransformerTokenizer from allennlp_models.generation import Seq2SeqDatasetReader from allennlp.common.util import START_SYMBOL, END_SYMBOL source_tokenizer = PretrainedTransformerTokenizer("bert-base-uncased", add_special_tokens=True) # Set up a target tokenizer so it is compatible with `start_symbol` and `end_symbol` tokenizer_kwargs = {"additional_special_tokens": [START_SYMBOL, END_SYMBOL]} target_tokenizer = PretrainedTransformerTokenizer("bert-base-uncased", add_special_tokens=False, tokenizer_kwargs=tokenizer_kwargs) # Raises ValueError reader = Seq2SeqDatasetReader( source_tokenizer=source_tokenizer, target_tokenizer=target_tokenizer, source_add_start_token=False, source_add_end_token=False, start_symbol=START_SYMBOL, end_symbol=END_SYMBOL ) ```

epwalsh commented 2 years ago

Hey @JohnGiorgi, yea I think that makes sense. Feel free to open a PR and ping me.

JohnGiorgi commented 2 years ago

@epwalsh, cool. Did that here: https://github.com/allenai/allennlp-models/pull/308/files