Closed EniKot closed 4 months ago
没遇到过这个错误,“当模型输出英文的时候没有上述问题,输出中文时则有很大的概率因为上述问题而停止运行”,您指的是您输入英文和输入中文的时候么,这个看起来是输入错误所导致的,感觉可能是由于terminal的输入方式或者输入法的原因。
没遇到过这个错误,“当模型输出英文的时候没有上述问题,输出中文时则有很大的概率因为上述问题而停止运行”,您指的是您输入英文和输入中文的时候么,这个看起来是输入错误所导致的,感觉可能是由于terminal的输入方式或者输入法的原因。
试了试,确实是输入的问题,用中文文本粘贴过去就没问题了:)
错误如下: Traceback (most recent call last): File "/root/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/root/autodl-tmp/MING/ming/serve/cli.py", line 129, in
main(args)
File "/root/autodl-tmp/MING/ming/serve/cli.py", line 108, in main
chat_loop(args.model_path, args.model_base, args.device, args.conv_template, args.temperature, args.max_new_tokens,args.beam_size,
File "/root/autodl-tmp/MING/ming/serve/inference.py", line 132, in chat_loop
output_stream = generate_stream_func(model, tokenizer, params, device, beam_size,context_len=context_len)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/root/autodl-tmp/MING/ming/serve/inference.py", line 41, in generate_stream
input_ids = tokenizer(prompt).input_ids
File "/root/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2858, in call
encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2964, in _call_one
return self.encode_plus(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3037, in encode_plus
return self._encode_plus(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 719, in _encode_plus
first_ids = get_input_ids(text)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 686, in get_input_ids
tokens = self.tokenize(text, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 617, in tokenize
tokenized_text.extend(self._tokenize(token))
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2/tokenization_qwen2.py", line 267, in _tokenize
self.byte_encoder[b] for b in token.encode("utf-8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in position 0: surrogates not allowed
使用的模型为MING-MOE-4B和对应的基座模型Qwen1.5 4B Chat,当模型输出英文的时候没有上述问题,输出中文时则有很大的概率因为上述问题而停止运行。软件环境方面使用了torch 2.1.2和python 3.10,路径不存在中文,请问有可能是python版本和pytorch版本导致的吗