NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.67k stars 991 forks source link

Qwen2 1.5B checkpoint conversion broken(tensorrt_llm=0.14.0) #2267

Closed yanglongbiao closed 1 month ago

yanglongbiao commented 1 month ago

System Info

-CPU架构x86_64 -GPU A100

Who can help?

No response

Information

Tasks

Reproduction

git clone https://huggingface.co/Qwen/Qwen2-1.5B ./tmp/Qwen2/1.5B

python convert_checkpoint.py --model_dir ./tmp/Qwen/7B/ \ --output_dir ./tllm_checkpoint_1gpu_fp16_wq \ --dtype float16 \ --use_weight_only \ --weight_only_precision int8

Expected behavior

conversion works

actual behavior

Traceback (most recent call last): File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 324, in main() File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 316, in main convert_and_save_hf(args) File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 269, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 276, in execute f(args, rank) File "/mnt/share/yanglongbiao/llm/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 255, in convert_and_save_rank qwen = QWenForCausalLM.from_hugging_face( File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/models/qwen/model.py", line 429, in from_hugging_face loader.generate_tllm_weights(model) File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 353, in generate_tllm_weights self.load(tllm_key, File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 274, in load v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs) File "/mnt/share/yanglongbiao/llm/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 392, in postprocess weights = weights.to(str_dtype_to_torch(self.dtype)) AttributeError: 'NoneType' object has no attribute 'to'

additional notes

Verifying the 7B model

yanglongbiao commented 1 month ago

更换trt-llm为0.12.0,问题解决

joostinyi commented 1 month ago

For those unable to downgrade, I was able to disable the ModelWeightsLoader using TRTLLM_DISABLE_UNIFIED_CONVERTER=1 as noted here https://github.com/NVIDIA/TensorRT-LLM/pull/2110#issue-2463325638