Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
GPT2Tokenizer fails to recover a sentence "BART is a seq2seq model." with encoded ids of it. The output sentence is "BART is a seqseq model.". It should be related to numbers' processing.
GPT2Tokenizer fails to recover a sentence "BART is a seq2seq model." with encoded ids of it. The output sentence is "BART is a seqseq model.". It should be related to numbers' processing.
A script to show the bug is here: https://github.com/tanyuqian/texar-pytorch/blob/master/examples/bart/gpt2_tokenizer_bug.py