THUNLP-MT / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group
BSD 3-Clause "New" or "Revised" License
701 stars 197 forks source link

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 14: #76

Closed HassanNaeemjutt closed 4 years ago

HassanNaeemjutt commented 4 years ago

Traceback (most recent call last): File "shuffle_corpus.py", line 64, in main(parsed_args) File "shuffle_corpus.py", line 30, in main data = [fd.readlines() for fd in stream] File "shuffle_corpus.py", line 30, in data = [fd.readlines() for fd in stream] File "C:\Users\Hassan\Anaconda3\envs\finalproject\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 14: character maps to

GrittyChen commented 4 years ago

@HassanNaeemjutt You can avoid this problem by running the shuffle_corpus.py with python 2.x, and we will fix the bug soon, thanks very much!