SleepySoft / Easy-Langchain-LKB

The simplest Local Knowledge Base example based on Langchain and Chat-GLM
11 stars 3 forks source link

运行main.py报错ValueError: Tokenizer class ChatGLMTokenizer does not exist or is not currently imported. #3

Open shnulailai opened 1 year ago

shnulailai commented 1 year ago

Loading /data/lailai_file/bigmodel/model_hub/chatglm-6b-int4 requires to execute some code in that repo, you can inspect the content of the repository at https://hf.co//data/lailai_file/bigmodel/model_hub/chatglm-6b-int4. You can dismiss this prompt by passing trust_remote_code=True. Do you accept? [y/N] y No compiled kernel found. Compiling kernels : /home/server2/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /home/server2/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /home/server2/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so Load kernel : /home/server2/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so Setting CPU quantization kernel threads to 28 Using quantization cache Applying quantization to glm layers Tokenizer class ChatGLMTokenizer does not exist or is not currently imported. Traceback (most recent call last): File "/data/lailai_file/bigmodel/Easy-Langchain-LKB/main.py", line 246, in main() File "/data/lailai_file/bigmodel/Easy-Langchain-LKB/main.py", line 198, in main embedding = setup_embedding() File "/data/lailai_file/bigmodel/Easy-Langchain-LKB/main.py", line 134, in setup_embedding embeddings = HuggingFaceEmbeddings( File "/home/server2/anaconda3/envs/lcglm/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 65, in init self.client = sentence_transformers.SentenceTransformer( File "/home/server2/anaconda3/envs/lcglm/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 97, in init modules = self._load_auto_model(model_path) File "/home/server2/anaconda3/envs/lcglm/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 806, in _load_auto_model transformer_model = Transformer(model_name_or_path) File "/home/server2/anaconda3/envs/lcglm/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 31, in init self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path if tokenizer_name_or_path is not None else model_name_or_path, cache_dir=cache_dir, **tokenizer_args) File "/home/server2/anaconda3/envs/lcglm/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 688, in from_pretrained raise ValueError( ValueError: Tokenizer class ChatGLMTokenizer does not exist or is not currently imported.

楼主,谢谢您的项目付出,但是我运行main.py遇到了上面的问题,请问一下您有这个错误的解决方案吗?

ps360pa commented 5 months ago

目前无解,因为HuggingFaceEmbeddings和transformers的AutoModel 只能加载离线的model, HuggingFaceEmbeddings 的参数 model_name 只要不是 他系统默认的,就没用。无论该路径包含不包含model.json。 AutoModel 可以用Modelscope来代替

from modelscope import AutoTokenizer, AutoModel, snapshot_download
model_dir = snapshot_download("ZhipuAI/chatglm3-6b", cache_dir='yourlocalpath', local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).quantize(4).cuda()  # int4
model = model.eval()