Failed to build for smoothquant bloomz-7b model

System Info

GPU A100 TRT-LLM 0.8.0.dev2024013000

Who can help?

@Tracin

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

convert bloom python convert_checkpoint.py --model_dir bloom/ --output_dir tllm_checkpoint_1gpu --smoothquant 0.5

build engine trtllm-build --checkpoint_dir tllm_checkpoint_1gpu --output_dir ./engine_outputs

Expected behavior

build successfully

actual behavior

Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 448, in main parallel_build(source, build_config, args.output_dir, workers, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 381, in parallel_build build_and_save(rank, rank % workers, ckpt_dir, build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 355, in build_and_save engine = build(build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 253, in build model.load(weights) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 331, in load raise RuntimeError(err_msg) RuntimeError: Provided tensor names are different from those expected by the engine. Expected but not provided tensors: {'transformer.layers.29.mlp.fc.weights_scaling_factor', 'transformer.layers.7.mlp.proj.prequant_scaling_factor', .....' transformer.layers.25.attention.dense.activation_scaling_factor', 'transformer.layers.23.attention.qkv.prequant_scaling_factor'}

Provided but not expected tensors: {'transformer.layers.7.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.fc.per_channel_scale', 'transformer.layers.6.mlp.quantization_scaling_factor', ... 'transformer.layers.21.mlp.proj.act_scale', 'transformer.layers.17.input_layernorm.scale_to_int'}

additional notes

none

NVIDIA / TensorRT-LLM