TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Traceback (most recent call last):
File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 448, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 381, in parallel_build
build_and_save(rank, rank % workers, ckpt_dir, build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 355, in build_and_save
engine = build(build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 253, in build
model.load(weights)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 331, in load
raise RuntimeError(err_msg)
RuntimeError: Provided tensor names are different from those expected by the engine.
Expected but not provided tensors: {'transformer.layers.29.mlp.fc.weights_scaling_factor', 'transformer.layers.7.mlp.proj.prequant_scaling_factor',
.....'
transformer.layers.25.attention.dense.activation_scaling_factor', 'transformer.layers.23.attention.qkv.prequant_scaling_factor'}
Provided but not expected tensors: {'transformer.layers.7.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.fc.per_channel_scale', 'transformer.layers.6.mlp.quantization_scaling_factor',
...
'transformer.layers.21.mlp.proj.act_scale', 'transformer.layers.17.input_layernorm.scale_to_int'}
System Info
GPU A100 TRT-LLM 0.8.0.dev2024013000
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
convert bloom python convert_checkpoint.py --model_dir bloom/ --output_dir tllm_checkpoint_1gpu --smoothquant 0.5
build engine trtllm-build --checkpoint_dir tllm_checkpoint_1gpu --output_dir ./engine_outputs
Expected behavior
build successfully
actual behavior
Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 448, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 381, in parallel_build
build_and_save(rank, rank % workers, ckpt_dir, build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 355, in build_and_save
engine = build(build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 253, in build
model.load(weights)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 331, in load
raise RuntimeError(err_msg)
RuntimeError: Provided tensor names are different from those expected by the engine.
Expected but not provided tensors: {'transformer.layers.29.mlp.fc.weights_scaling_factor', 'transformer.layers.7.mlp.proj.prequant_scaling_factor',
.....'
transformer.layers.25.attention.dense.activation_scaling_factor', 'transformer.layers.23.attention.qkv.prequant_scaling_factor'}
Provided but not expected tensors: {'transformer.layers.7.input_layernorm.scale_to_int', 'transformer.layers.19.mlp.fc.per_channel_scale', 'transformer.layers.6.mlp.quantization_scaling_factor', ... 'transformer.layers.21.mlp.proj.act_scale', 'transformer.layers.17.input_layernorm.scale_to_int'}
additional notes
none