NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.34k stars 936 forks source link

Failed to build for smoothquant bloomz-7b model #1162

Open moonlightian opened 7 months ago

moonlightian commented 7 months ago

System Info

GPU A100 TRT-LLM 0.8.0.dev2024013000

Who can help?

@Tracin

Information

Tasks

Reproduction

convert bloom python convert_checkpoint.py --model_dir bloom/ --output_dir tllm_checkpoint_1gpu --smoothquant 0.5

build engine trtllm-build --checkpoint_dir tllm_checkpoint_1gpu --output_dir ./engine_outputs

Expected behavior

build successfully

actual behavior

Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 448, in main parallel_build(source, build_config, args.output_dir, workers, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 381, in parallel_build build_and_save(rank, rank % workers, ckpt_dir, build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 355, in build_and_save engine = build(build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 253, in build model.load(weights) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 331, in load raise RuntimeError(err_msg) RuntimeError: Provided tensor names are different from those expected by the engine. Expected but not provided tensors: {'transformer.layers.29.mlp.fc.weights_scaling_factor', 'transformer.layers.7.mlp.proj.prequant_scaling_factor', .....' transformer.layers.25.attention.dense.activation_scaling_factor', 'transformer.layers.23.attention.qkv.prequant_scaling_factor'}

Provided but not expected tensors: {'transformer.layers.7.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.fc.per_channel_scale', 'transformer.layers.6.mlp.quantization_scaling_factor', ... 'transformer.layers.21.mlp.proj.act_scale', 'transformer.layers.17.input_layernorm.scale_to_int'}

additional notes

none

Tracin commented 7 months ago

Could you please try with the latest main branch? I think it has been fixed.