HendrikStrobelt / detecting-fake-text

Giant Language Model Test Room
Apache License 2.0
462 stars 112 forks source link

KeyError when ’ style apostrophes (or quotations) are inputted as text. #9

Closed airmak closed 4 years ago

airmak commented 4 years ago

Traceback:

  File "C:\Python38\lib\site-packages\pytorch_pretrained_bert\tokenization_gpt2.py", line 224, in <genexpr>
    token = ''.join(self.byte_encoder[ord(b)] for b in token)
KeyError: 8217

With quotations, a KeyError: 8220 is thrown. When the text is changed by switching ’ into ', it works fine, but if not fixed it might cause mysterious errors especially when copy pasting text from another source.