Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.25k stars 1.29k forks source link

'gbk' codec can't decode byte 0x93 in position 978: illegal multibyte sequence and then a bytes-like object is required, not 'str' #146

Open Ailing-Zou opened 4 years ago

Ailing-Zou commented 4 years ago

Hi, when I first run this code,

File "D:/transformer/prepro.py", line 37, in _prepro = lambda x: [line.strip() for line in open(x, 'r').read().split("\n") \ UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 978: illegal multibyte sequence

After I change this row into _prepro = lambda x: [line.strip() for line in open(x, 'rb).read().split("\n") \ if not line.startswith("<")] a bytes-like object is required, not 'str'.

So what kind of way should I use to open this file? Look forward to reply.

lushunn commented 3 years ago

adding encoding='utf-8' in open function when you open file

KMY-SEU commented 3 years ago

adding encoding='utf-8' in open function when you open file

NB!