Closed zzc0208 closed 5 months ago
ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.
The log shows Qwen2Tokenizer
does not exist.
Could you try the following code to check if the tokenizer works well?
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('~/text-generation-webui/models/Qwen_Qwen1.5-32B-Chat-AWQ')
BTW, please add --model-format awq
when launching the service.
我尝试了您提供的代码,运行结果如下:
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained('/home/linjl/text-generation-webui/models/Qwen_Qwen1.5-32B-Chat-AWQ')
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>>
我尝试了在启动api server
的时候加入--model-format awq
选项,还是报错
ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.
[transformers](transformers: 4.30.2)
qwen1.5 要求 transformers>=4.37.0
您好,我更新了transformers的版本,但是目前似乎遇到了另一个问题
(sd) linjl@bme-server:~$ lmdeploy serve api_server ~/text-generation-webui/models/Qwen_Qwen1.5-32B-Chat-AWQ --model-name qwen1.5-32B --server-port 2333 --quant-policy 8 --model-format awq
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Convert to turbomind format: 0%| | 0/64 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/linjl/anaconda3/envs/sd/bin/lmdeploy", line 8, in <module>
sys.exit(run())
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
args.run(args)
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 303, in api_server
run_api_server(args.model_path,
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1191, in serve
VariableInterface.async_engine = pipeline_class(
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 206, in __init__
self._build_turbomind(model_path=model_path,
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 253, in _build_turbomind
self.engine = tm.TurboMind.from_pretrained(
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 387, in from_pretrained
return cls(model_path=pretrained_model_name_or_path,
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 161, in __init__
self.model_comm = self._from_hf(model_source=model_source,
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 296, in _from_hf
output_model.export()
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 274, in export
self.export_transformer_block(bin, i)
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/w4.py", line 127, in export_transformer_block
self.save_split(qkv_sz, f'layers.{i}.attention.w_qkv.scales_zeros', -1)
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 252, in save_split
self.export_weight(split, f'{prefix}.{i}{ext}')
File "/home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 230, in export_weight
tm_tensor.copy_from(torch_tensor)
RuntimeError: [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/python/bind.cpp:294
(sd) linjl@bme-server:~$
Hi, @zzc0208 如果你用的是qwen官方提供的https://huggingface.co/Qwen/Qwen1.5-32B-Chat-AWQ,lmdeploy现在是支持不了的。 因为这个模型量化的时候,group_size为32,而 lmdeploy 支持的是group_size 128
"quantization_config": {
"bits": 4,
"group_size": 32,
"modules_to_not_convert": null,
"quant_method": "awq",
"version": "gemm",
"zero_point": true
},
一种解决方式是,使用 lmdeploy lite auto_awq
工具重新量化 qwen1.5-32b-chat。
我们也有计划支持 group_size = 32, 64,不过会比较晚,估计在7月份。
了解了,感谢
Checklist
Describe the bug
我想用lmdeploy推理qwen1.5-32-awq,但是报错
ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.
,我看lmdeploy的仓库写兼容qwen1.5,但是实际推理不了Reproduction
lmdeploy serve api_server ~/text-generation-webui/models/Qwen_Qwen1.5-32B-Chat-AWQ --model-name qwen1.5-32B --server-port 2333 --quant-policy 8
Environment
Error traceback