Morizeyao / GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.
MIT License
7.46k stars 1.7k forks source link

您好,请问在生成文本的时候一直报错uft-8,但是版本和训练语料都检查了,求解答,谢谢大佬们 #244

Closed hannah-saber closed 2 years ago

hannah-saber commented 2 years ago

Traceback (most recent call last): File "E:/code/kxy_code/GPT2-Chinese-old_gpt_2_chinese_before_2021_4_22/generate_texts.py", line 186, in main() File "E:/code/kxy_code/GPT2-Chinese-old_gpt_2_chinese_before_2021_4_22/generate_texts.py", line 139, in main model = GPT2LMHeadModel.from_pretrained(args.model_path) File "D:\anaconda3\lib\site-packages\transformers\modeling_utils.py", line 287, in from_pretrained **kwargs File "D:\anaconda3\lib\site-packages\transformers\configuration_utils.py", line 154, in from_pretrained config = cls.from_json_file(resolved_config_file) File "D:\anaconda3\lib\site-packages\transformers\configuration_utils.py", line 186, in from_json_file text = reader.read() File "D:\anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte