THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
39.96k stars 5.15k forks source link

[BUG/Help] <title>windows系统用cpu跑cli_demo,已经安装gcc,第二行tokenizer报错 #1449

Open lixianqi opened 5 months ago

lixianqi commented 5 months ago

Is there an existing issue for this?

Current Behavior

如题,第一次配置大模型,一下是报错内容 Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1958, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 221, in init self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens) File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 64, in init self.text_tokenizer = TextTokenizer(vocab_file) File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 22, in init self.sp.Load(model_path) File "D:\Anaconda\envs\LLM\lib\site-packages\sentencepiece__init__.py", line 905, in Load return self.LoadFromFile(model_file) File "D:\Anaconda\envs\LLM\lib\site-packages\sentencepiece__init__.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) OSError: Not found: "C:\Users\username/.cache\huggingface\hub\models--THUDM--chatglm-6b\snapshots\8b7d33596d18c5e83e2da052d05ca4db02e60620\ice_text.model": Illegal byte sequence Error #42

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\username\ChatGLM-6B\cli_demo.py", line 7, in tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)#C:/Users/李贤琦/Desktop/LLM THUDM/chatglm-6b File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 679, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1804, in from_pretrained return cls._from_pretrained( File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1960, in _from_pretrained raise OSError( OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted. 求大佬们解答

Expected Behavior

No response

Steps To Reproduce

就是按照官方文档一步步安装的

Environment

- OS:Windows 10
- Python:3.9
- Transformers:4.27.1
- PyTorch:23.3.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response