649453932 / Chinese-Text-Classification-Pytorch

中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
MIT License
5.27k stars 1.23k forks source link

Use utf-8 everywhere #12

Closed lyriccoder closed 4 years ago

lyriccoder commented 4 years ago

Hi guys, I 'm really appreciated for the algorithms you have provided.

Could you please use utf-8 encoding everywhere? E.g., FastText.py, 16-17 lines should be the following:

        self.class_list = [x.strip() for x in open(
            dataset + '/data/class.txt', **encoding='utf-8'**).readlines()] 

Otherwise I get the expected error:

    dataset + '/data/class.txt').readlines()]                                # 类别名单
  File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 51: character maps to <undefined>

Could you please add it to each algorithm?

649453932 commented 4 years ago

Thanks for your issue, I've fixed it.

lyriccoder commented 4 years ago

Thank you