UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128)

dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT

Apache License 2.0

589 stars 181 forks source link

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128) #13

Closed likerainsun closed 4 years ago

likerainsun commented 5 years ago

I downloaded the pretrained bert model. Running the fine-tuning step brings an error when loading vocab file, I assume.

Any idea to fix it?

dhlee347 commented 5 years ago

The tokenization.py is from the google bert official repo. How about reporting this bug to the repo? (you might be able to reproduce the same error in that codes.)