dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT
Apache License 2.0
589 stars 181 forks source link

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128) #13

Closed likerainsun closed 4 years ago

likerainsun commented 5 years ago

I downloaded the pretrained bert model. Running the fine-tuning step brings an error when loading vocab file, I assume.

Any idea to fix it?

image

dhlee347 commented 5 years ago

The tokenization.py is from the google bert official repo. How about reporting this bug to the repo? (you might be able to reproduce the same error in that codes.)