Closed yanglongbiao closed 1 month ago
更换trt-llm为0.12.0,问题解决
For those unable to downgrade, I was able to disable the ModelWeightsLoader
using TRTLLM_DISABLE_UNIFIED_CONVERTER=1
as noted here https://github.com/NVIDIA/TensorRT-LLM/pull/2110#issue-2463325638
System Info
-CPU架构x86_64 -GPU A100
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
git clone https://huggingface.co/Qwen/Qwen2-1.5B ./tmp/Qwen2/1.5B
python convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ \ --output_dir ./tllm_checkpoint_1gpu_fp16_wq \ --dtype float16 \ --use_weight_only \ --weight_only_precision int8
Expected behavior
conversion works
actual behavior
Traceback (most recent call last): File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 324, in
main()
File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 316, in main
convert_and_save_hf(args)
File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 269, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 276, in execute
f(args, rank)
File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 255, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/models/qwen/model.py", line 429, in from_hugging_face
loader.generate_tllm_weights(model)
File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 353, in generate_tllm_weights
self.load(tllm_key,
File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 274, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 392, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
additional notes
Verifying the 7B model