Closed AIFFFENG closed 2 days ago
复现不了。请确保推理代码,模型路径传的是对的。
复现不了。请确保推理代码,模型路径传的是对的。
模型路径没有问题的,推理代码如下,两种量化方式都是采取的以下代码 from lmdeploy import pipeline, TurbomindEngineConfig from lmdeploy import pipeline, ChatTemplateConfig,TurbomindEngineConfig engine_config = TurbomindEngineConfig(model_format='awq') from lmdeploy.vl import load_image pipe = pipeline(model_path, chat_template_config=ChatTemplateConfig(model_name='qwen-7b'))#,backend_config=engine_config) gen_config = GenerationConfig(top_p=1, top_k=1, temperature=0.01, max_new_tokens=1024, random_seed=None) begin=time.time() for i in range(1): for name in range(20):#os.listdir(image_dir)[:1]:
image = load_image(image_path)
response = pipe((instruct_question, image), gen_config=gen_config,backend_config=engine_config)
text=response.text
复现不了。另外你代码里调用pipe的 __call__
函数,backend_config 传没有意义。要在用 pipeline 函数的时候调用。
这个报错应该是 awq 模型用普通 fp16 的方式启动了,虽然不知道怎么回事。但是你手动传一下 backend_config 到 pipeline 里吧
这个报错应该是 awq 模型用普通 fp16 的方式启动了,虽然不知道怎么回事。但是你手动传一下 backend_config 到 pipeline 里吧
sorry,刚才传的代码是普通推理的,量化推理确实在pipeline传入了backend_config,我再确认一下,多谢
这个报错应该是 awq 模型用普通 fp16 的方式启动了,虽然不知道怎么回事。但是你手动传一下 backend_config 到 pipeline 里吧
试了一下,可以了,之前出现这个问题可能是因为显存不足导致的
Checklist
Describe the bug
Traceback (most recent call last): File "/data/54T/多模态大模型加速/lmdeploy_chat_int4_可视化.py", line 42, in
pipe = pipeline(model_path,
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/api.py", line 94, in pipeline
return pipeline_class(model_path,
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/serve/vl_async_engine.py", line 21, in init
super().init(model_path, **kwargs)
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 206, in init
self._build_turbomind(model_path=model_path,
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 253, in _build_turbomind
self.engine = tm.TurboMind.from_pretrained(
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 387, in from_pretrained
return cls(model_path=pretrained_model_name_or_path,
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 161, in init
self.model_comm = self._from_hf(model_source=model_source,
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 270, in _from_hf
output_model = OUTPUT_MODELS.get(output_format)(
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/target_model/fp.py", line 26, in init
super().init(input_model, cfg, to_file, out_dir)
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 156, in init
self.cfg = self.get_config(cfg)
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/target_model/fp.py", line 38, in getconfig
w1, , _ = bin.ffn(i)
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/source_model/qwen.py", line 62, in ffn
return self._ffn(i, 'weight')
File "/data/54T/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/source_model/qwen.py", line 56, in _ffn
tensor = self.params[f'transformer.h.{i}.mlp.{key}.{kind}']
KeyError: 'transformer.h.0.mlp.w2.weight'
Reproduction
python3 -m lmdeploy lite auto_awq /data/54T/luominghua/models/qwen-vl-chat-0625 --calib-samples 128 --search-scale True --batch-size 8 --calib-seqlen 2048 --w-bits 4 --w-group-size 128 --work-dir /data/54T/luominghua/models/qwen-vl-chat-0625-int4-search_batch
Environment
Error traceback
No response