gitabtion / SoftMaskedBert-PyTorch

🙈 An unofficial implementation of SoftMaskedBert based on huggingface/transformers.
MIT License
94 stars 17 forks source link

字符编码问题 #15

Closed Lyn-bia closed 3 years ago

Lyn-bia commented 3 years ago

……data_processor.py", line 118, in read_data for line in f: UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 16: illegal multibyte sequence

麻烦请教一下读入数据集应该采用什么编码格式?UTF-8和GBK都报错。

gitabtion commented 3 years ago

你可以用这个仓库处理的数据https://github.com/gitabtion/BertBasedCorrectionModels,后期我会把该仓库的数据处理脚本同步到本仓库。

Toddzhangwj commented 2 years ago

您好,我也遇到了编码问题,请问有什么解决的办法吗?