NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.34k stars 936 forks source link

[Mixtral 8x7B] trtllm-build | RuntimeError: Provided tensor names are different from those expected by the engine. #1109

Open mfournioux opened 7 months ago

mfournioux commented 7 months ago

I have sucessfully converted a Mixtral 8x7B model with tensor parallelism following this script from llama example folder :

python convert_checkpoint.py --model_dir ./Mixtral-8x7B-v0.1 \ --output_dir ./tllm_checkpoint_mixtral_2gpu \ --dtype float16 \ --tp_size 2

Then, when I start building the engine with this command :

trtllm-build --checkpoint_dir ./tllm_checkpoint_mixtral_2gpu \ --output_dir ./trt_engines/mixtral/tp2 \ --gemm_plugin float16

This error appears :

Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 489, in main parallel_build(source, build_config, args.output_dir, workers, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 413, in parallel_build passed = build_and_save(rank, rank % workers, ckpt_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 385, in build_and_save engine = build(build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 266, in build model.load(weights) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 338, in load raise RuntimeError(err_msg) RuntimeError: Provided tensor names are different from those expected by the engine.

Do you have any suggestion how to solve this issue please?

Many thanks for your help

QiJune commented 6 months ago

@nv-guomingz Could you please take a look? Thanks

nv-guomingz commented 6 months ago

hi @mfournioux , I can't reproduce this issue on our latest main branch, would u please have a try? If the issue still exists, please let us know.

raymondbernard commented 6 months ago

I have the same issue .

(.venv) F:\pythonprograms\llmstreaming>trtllm-build --checkpoint_dir F:\pythonprograms\llmstreaming\tllm_checkpoint_1gpu_streamingllm --output_dir ./mistralengine_streaming --gemm_plugin float16 [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/15/2024-16:14:22] [TRT-LLM] [I] Set bert_attention_plugin to float16. [03/15/2024-16:14:22] [TRT-LLM] [I] Set gpt_attention_plugin to float16. [03/15/2024-16:14:22] [TRT-LLM] [I] Set gemm_plugin to float16. [03/15/2024-16:14:22] [TRT-LLM] [I] Set lookup_plugin to None. [03/15/2024-16:14:22] [TRT-LLM] [I] Set lora_plugin to None. [03/15/2024-16:14:22] [TRT-LLM] [I] Set context_fmha to True. [03/15/2024-16:14:22] [TRT-LLM] [I] Set context_fmha_fp32_acc to False. [03/15/2024-16:14:22] [TRT-LLM] [I] Set paged_kv_cache to True. [03/15/2024-16:14:22] [TRT-LLM] [I] Set remove_input_padding to True. [03/15/2024-16:14:22] [TRT-LLM] [I] Set use_custom_all_reduce to True. [03/15/2024-16:14:22] [TRT-LLM] [I] Set multi_block_mode to False. [03/15/2024-16:14:22] [TRT-LLM] [I] Set enable_xqa to True. [03/15/2024-16:14:22] [TRT-LLM] [I] Set attention_qk_half_accumulation to False. [03/15/2024-16:14:22] [TRT-LLM] [I] Set tokens_per_block to 128. [03/15/2024-16:14:22] [TRT-LLM] [I] Set use_paged_context_fmha to False. [03/15/2024-16:14:22] [TRT-LLM] [I] Set use_context_fmha_for_generation to False. [03/15/2024-16:14:22] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_sizemax_input_len. It may not be optimal to set max_num_tokens=max_batch_sizemax_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads. Traceback (most recent call last): File "C:\Users\RayBe\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\RayBe\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\pythonprograms\llmstreaming.venv\Scripts\trtllm-build.exe__main__.py", line 7, in File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 497, in main parallel_build(source, build_config, args.output_dir, workers, File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 420, in parallel_build passed = build_and_save(rank, rank % workers, ckpt_dir, File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 392, in build_and_save engine = build(build_config, File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 272, in build model.load(weights) File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\models\modeling_utils.py", line 338, in load raise RuntimeError(err_msg) RuntimeError: Provided tensor names are different from those expected by the engine. Expected but not provided tensors: {'transformer.layers.20.mlp.gate.weight', 'transformer.layers.18.post_layernorm.weight', 'transformer.layers.20.mlp.fc.weight', 'transformer.layers.13.mlp.gate.weight', 'transformer.layers.14.attention.qkv.weight', 'transformer.layers.27.post_layernorm.weight', 'transformer.layers.26.mlp.gate.weight', 'transformer.layers.28.mlp.fc.weight', 'transformer.layers.24.mlp.proj.weight', 'transformer.layers.19.post_layernorm.weight', 'transformer.layers.26.attention.dense.weight', 'transformer.layers.30.input_layernorm.weight', 'transformer.layers.28.input_layernorm.weight', 'transformer.layers.23.attention.dense.weight', 'transformer.layers.21.mlp.gate.weight', 'transformer.layers.20.post_layernorm.weight', 'transformer.layers.23.mlp.fc.weight', 'transformer.layers.24.mlp.gate.weight', 'transformer.layers.20.input_layernorm.weight', 'transformer.layers.21.mlp.fc.weight', 'transformer.layers.16.mlp.gate.weight', 'transformer.layers.18.input_layernorm.weight', 'transformer.layers.16.attention.qkv.weight', 'transformer.layers.16.post_layernorm.weight', 'transformer.layers.26.mlp.fc.weight', 'transformer.layers.13.mlp.proj.weight', 'transformer.layers.29.mlp.fc.weight', 'transformer.layers.28.attention.qkv.weight', 'transformer.layers.16.mlp.fc.weight', 'transformer.layers.29.mlp.proj.weight', 'transformer.layers.15.mlp.fc.weight', 'transformer.layers.31.post_layernorm.weight', 'transformer.layers.21.post_layernorm.weight', 'transformer.layers.29.post_layernorm.weight', 'transformer.layers.27.mlp.fc.weight', 'transformer.layers.18.mlp.proj.weight', 'transformer.layers.29.attention.qkv.weight', 'transformer.layers.21.attention.dense.weight', 'transformer.layers.24.mlp.fc.weight', 'transformer.layers.16.input_layernorm.weight', 'transformer.layers.27.mlp.gate.weight', 'transformer.layers.19.attention.qkv.weight', 'transformer.layers.31.mlp.fc.weight', 'transformer.layers.13.attention.qkv.weight', 'transformer.layers.14.attention.dense.weight', 'transformer.layers.20.mlp.proj.weight', 'transformer.layers.24.attention.dense.weight', 'transformer.layers.23.mlp.proj.weight', 'transformer.layers.30.post_layernorm.weight', 'transformer.layers.17.mlp.fc.weight', 'transformer.layers.26.post_layernorm.weight', 'transformer.layers.17.mlp.gate.weight', 'transformer.layers.15.post_layernorm.weight', 'transformer.layers.25.attention.dense.weight', 'transformer.layers.31.mlp.gate.weight', 'transformer.layers.30.mlp.proj.weight', 'transformer.layers.27.mlp.proj.weight', 'transformer.layers.23.input_layernorm.weight', 'transformer.layers.18.attention.qkv.weight', 'transformer.layers.18.mlp.fc.weight', 'transformer.layers.25.mlp.fc.weight', 'transformer.layers.30.mlp.fc.weight', 'transformer.layers.23.attention.qkv.weight', 'transformer.layers.30.mlp.gate.weight', 'transformer.layers.25.mlp.proj.weight', 'transformer.layers.17.attention.qkv.weight', 'transformer.layers.15.input_layernorm.weight', 'transformer.layers.14.mlp.fc.weight', 'transformer.layers.29.attention.dense.weight', 'transformer.layers.22.input_layernorm.weight', 'transformer.layers.27.attention.qkv.weight', 'transformer.layers.21.input_layernorm.weight', 'transformer.layers.17.attention.dense.weight', 'transformer.layers.31.attention.qkv.weight', 'transformer.layers.13.post_layernorm.weight', 'transformer.layers.22.mlp.proj.weight', 'transformer.layers.17.post_layernorm.weight', 'transformer.layers.19.mlp.proj.weight', 'transformer.layers.21.mlp.proj.weight', 'transformer.layers.14.mlp.gate.weight', 'transformer.layers.20.attention.qkv.weight', 'transformer.layers.24.attention.qkv.weight', 'transformer.layers.29.mlp.gate.weight', 'transformer.layers.19.attention.dense.weight', 'transformer.layers.28.mlp.proj.weight', 'transformer.layers.17.input_layernorm.weight', 'transformer.layers.18.mlp.gate.weight', 'transformer.layers.13.attention.dense.weight', 'transformer.layers.13.mlp.fc.weight', 'transformer.layers.28.post_layernorm.weight', 'transformer.layers.25.input_layernorm.weight', 'transformer.layers.22.post_layernorm.weight', 'transformer.layers.25.post_layernorm.weight', 'transformer.layers.21.attention.qkv.weight', 'transformer.layers.18.attention.dense.weight', 'transformer.layers.20.attention.dense.weight', 'transformer.layers.24.post_layernorm.weight', 'transformer.layers.16.mlp.proj.weight', 'transformer.layers.26.input_layernorm.weight', 'transformer.layers.19.mlp.fc.weight', 'transformer.layers.15.mlp.proj.weight', 'transformer.layers.15.attention.dense.weight', 'transformer.layers.25.attention.qkv.weight', 'transformer.layers.22.mlp.fc.weight', 'transformer.layers.31.input_layernorm.weight', 'transformer.layers.31.attention.dense.weight', 'transformer.layers.15.mlp.gate.weight', 'transformer.layers.27.attention.dense.weight', 'transformer.layers.14.input_layernorm.weight', 'transformer.layers.17.mlp.proj.weight', 'transformer.layers.22.mlp.gate.weight', 'transformer.layers.22.attention.dense.weight', 'transformer.layers.22.attention.qkv.weight', 'transformer.layers.19.mlp.gate.weight', 'transformer.layers.23.mlp.gate.weight', 'transformer.layers.30.attention.qkv.weight', 'transformer.layers.15.attention.qkv.weight', 'transformer.layers.14.mlp.proj.weight', 'transformer.layers.24.input_layernorm.weight', 'transformer.layers.28.attention.dense.weight', 'transformer.layers.29.input_layernorm.weight', 'transformer.ln_f.weight', 'transformer.layers.26.mlp.proj.weight', 'transformer.layers.19.input_layernorm.weight', 'transformer.layers.25.mlp.gate.weight', 'transformer.layers.27.input_layernorm.weight', 'transformer.layers.13.input_layernorm.weight', 'transformer.layers.16.attention.dense.weight', 'transformer.layers.23.post_layernorm.weight', 'transformer.layers.14.post_layernorm.weight', 'transformer.layers.26.attention.qkv.weight', 'transformer.layers.31.mlp.proj.weight', 'transformer.layers.30.attention.dense.weight', 'transformer.layers.28.mlp.gate.weight'}

raymondbernard commented 6 months ago

We are folloinw the https://console.brev.dev/notebook/streamingllm-tensorrt-llm on windows! Urg.. There seems to be a mismatch versions of what is installed .

raymondbernard commented 6 months ago

I put two print statments in the PretainedModel.py .. thinknig is you don't support Mistral v2 and Mixtral! (.venv) F:\pythonprograms\llmstreaming>trtllm-build --checkpoint_dir F:\pythonprograms\llmstreaming\tllm_checkpoint_1gpu_streamingllm --output_dir ./mistralengine_streaming --gemm_plugin float16 [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/15/2024-16:38:54] [TRT-LLM] [I] Set bert_attention_plugin to float16. [03/15/2024-16:38:54] [TRT-LLM] [I] Set gpt_attention_plugin to float16. [03/15/2024-16:38:54] [TRT-LLM] [I] Set gemm_plugin to float16. [03/15/2024-16:38:54] [TRT-LLM] [I] Set lookup_plugin to None. [03/15/2024-16:38:54] [TRT-LLM] [I] Set lora_plugin to None. [03/15/2024-16:38:54] [TRT-LLM] [I] Set context_fmha to True. [03/15/2024-16:38:54] [TRT-LLM] [I] Set context_fmha_fp32_acc to False. [03/15/2024-16:38:54] [TRT-LLM] [I] Set paged_kv_cache to True. [03/15/2024-16:38:54] [TRT-LLM] [I] Set remove_input_padding to True. [03/15/2024-16:38:54] [TRT-LLM] [I] Set use_custom_all_reduce to True. [03/15/2024-16:38:54] [TRT-LLM] [I] Set multi_block_mode to False. [03/15/2024-16:38:54] [TRT-LLM] [I] Set enable_xqa to True. [03/15/2024-16:38:54] [TRT-LLM] [I] Set attention_qk_half_accumulation to False. [03/15/2024-16:38:54] [TRT-LLM] [I] Set tokens_per_block to 128. [03/15/2024-16:38:54] [TRT-LLM] [I] Set use_paged_context_fmha to False. [03/15/2024-16:38:54] [TRT-LLM] [I] Set use_context_fmha_for_generation to False. [03/15/2024-16:38:54] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_sizemax_input_len. It may not be optimal to set max_num_tokens=max_batch_sizemax_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads. line 332 load PretrainedModel file we are passing == {'transformer.layers.1.attention.qkv.weight', 'transformer.layers.0.mlp.gate.weight', 'transformer.layers.3.attention.dense.weight', 'transformer.layers.7.input_layernorm.weight', 'transformer.layers.5.input_layernorm.weight', 'transformer.layers.4.attention.dense.weight', 'transformer.layers.5.mlp.fc.weight', 'transformer.layers.12.attention.dense.weight', 'transformer.layers.11.attention.qkv.weight', 'transformer.layers.1.mlp.proj.weight', 'transformer.layers.0.mlp.proj.weight', 'transformer.layers.12.mlp.proj.weight', 'transformer.layers.6.mlp.fc.weight', 'transformer.layers.3.input_layernorm.weight', 'transformer.layers.12.post_layernorm.weight', 'transformer.layers.3.mlp.fc.weight', 'transformer.layers.4.mlp.fc.weight', 'transformer.layers.9.attention.qkv.weight', 'transformer.layers.7.mlp.proj.weight', 'transformer.vocab_embedding.weight', 'transformer.layers.6.mlp.gate.weight', 'transformer.layers.1.mlp.fc.weight', 'transformer.layers.6.post_layernorm.weight', 'transformer.layers.8.mlp.fc.weight', 'transformer.layers.10.post_layernorm.weight', 'transformer.layers.0.attention.dense.weight', 'transformer.layers.3.mlp.gate.weight', 'transformer.layers.9.mlp.proj.weight', 'transformer.layers.10.mlp.gate.weight', 'transformer.layers.0.attention.qkv.weight', 'transformer.layers.12.mlp.fc.weight', 'transformer.layers.2.mlp.gate.weight', 'transformer.layers.1.mlp.gate.weight', 'transformer.layers.2.input_layernorm.weight', 'transformer.layers.9.mlp.fc.weight', 'transformer.layers.11.attention.dense.weight', 'transformer.layers.2.post_layernorm.weight', 'transformer.layers.11.mlp.proj.weight', 'lm_head.weight', 'transformer.layers.10.input_layernorm.weight', 'transformer.layers.1.attention.dense.weight', 'transformer.layers.7.mlp.gate.weight', 'transformer.layers.8.input_layernorm.weight', 'transformer.layers.6.attention.dense.weight', 'transformer.layers.11.mlp.gate.weight', 'transformer.layers.12.attention.qkv.weight', 'transformer.layers.10.attention.qkv.weight', 'transformer.layers.8.post_layernorm.weight', 'transformer.layers.6.input_layernorm.weight', 'transformer.layers.4.mlp.gate.weight', 'transformer.layers.8.attention.qkv.weight', 'transformer.layers.12.input_layernorm.weight', 'transformer.layers.11.mlp.fc.weight', 'transformer.layers.9.post_layernorm.weight', 'transformer.layers.11.input_layernorm.weight', 'transformer.layers.5.post_layernorm.weight', 'transformer.layers.4.attention.qkv.weight', 'transformer.layers.9.input_layernorm.weight', 'transformer.layers.10.mlp.fc.weight', 'transformer.layers.3.post_layernorm.weight', 'transformer.layers.5.attention.dense.weight', 'transformer.layers.9.attention.dense.weight', 'transformer.layers.7.attention.dense.weight', 'transformer.layers.1.post_layernorm.weight', 'transformer.layers.1.input_layernorm.weight', 'transformer.layers.4.input_layernorm.weight', 'transformer.layers.7.attention.qkv.weight', 'transformer.layers.2.mlp.fc.weight', 'transformer.layers.6.mlp.proj.weight', 'transformer.layers.10.attention.dense.weight', 'transformer.layers.0.mlp.fc.weight', 'transformer.layers.2.mlp.proj.weight', 'transformer.layers.8.mlp.gate.weight', 'transformer.layers.4.post_layernorm.weight', 'transformer.layers.7.post_layernorm.weight', 'transformer.layers.0.input_layernorm.weight', 'transformer.layers.5.mlp.proj.weight', 'transformer.layers.8.mlp.proj.weight', 'transformer.layers.2.attention.qkv.weight', 'transformer.layers.6.attention.qkv.weight', 'transformer.layers.5.attention.qkv.weight', 'transformer.layers.0.post_layernorm.weight', 'transformer.layers.11.post_layernorm.weight', 'transformer.layers.4.mlp.proj.weight', 'transformer.layers.7.mlp.fc.weight', 'transformer.layers.3.attention.qkv.weight', 'transformer.layers.3.mlp.proj.weight', 'transformer.layers.9.mlp.gate.weight', 'transformer.layers.8.attention.dense.weight', 'transformer.layers.12.mlp.gate.weight', 'transformer.layers.2.attention.dense.weight', 'transformer.layers.5.mlp.gate.weight', 'transformer.layers.10.mlp.proj.weight'} line 333 load PretrainedModel what the engine is expecting == {'transformer.layers.1.attention.qkv.weight', 'transformer.layers.31.mlp.gate.weight', 'transformer.layers.13.mlp.fc.weight', 'transformer.layers.26.mlp.proj.weight', 'transformer.layers.15.mlp.fc.weight', 'transformer.layers.0.mlp.gate.weight', 'transformer.layers.28.post_layernorm.weight', 'transformer.layers.31.input_layernorm.weight', 'transformer.layers.3.attention.dense.weight', 'transformer.layers.7.input_layernorm.weight', 'transformer.layers.25.post_layernorm.weight', 'transformer.layers.5.input_layernorm.weight', 'transformer.layers.4.attention.dense.weight', 'transformer.layers.19.post_layernorm.weight', 'transformer.layers.21.attention.dense.weight', 'transformer.layers.16.post_layernorm.weight', 'transformer.layers.30.input_layernorm.weight', 'transformer.layers.5.mlp.fc.weight', 'transformer.layers.12.attention.dense.weight', 'transformer.layers.11.attention.qkv.weight', 'transformer.layers.18.post_layernorm.weight', 'transformer.layers.1.mlp.proj.weight', 'transformer.layers.31.attention.dense.weight', 'transformer.layers.21.attention.qkv.weight', 'transformer.layers.0.mlp.proj.weight', 'transformer.layers.28.attention.dense.weight', 'transformer.layers.24.mlp.fc.weight', 'transformer.layers.20.attention.dense.weight', 'transformer.layers.27.mlp.fc.weight', 'transformer.layers.13.mlp.proj.weight', 'transformer.layers.17.attention.dense.weight', 'transformer.layers.24.post_layernorm.weight', 'transformer.layers.12.mlp.proj.weight', 'transformer.layers.13.attention.dense.weight', 'transformer.layers.20.mlp.proj.weight', 'transformer.layers.30.post_layernorm.weight', 'transformer.layers.16.input_layernorm.weight', 'transformer.layers.29.attention.qkv.weight', 'transformer.layers.30.attention.dense.weight', 'transformer.layers.30.attention.qkv.weight', 'transformer.layers.6.mlp.fc.weight', 'transformer.layers.3.input_layernorm.weight', 'transformer.layers.3.mlp.fc.weight', 'transformer.layers.12.post_layernorm.weight', 'transformer.layers.29.input_layernorm.weight', 'transformer.layers.22.input_layernorm.weight', 'transformer.layers.17.mlp.fc.weight', 'transformer.layers.4.mlp.fc.weight', 'transformer.layers.27.attention.dense.weight', 'transformer.layers.9.attention.qkv.weight', 'transformer.layers.15.attention.qkv.weight', 'transformer.vocab_embedding.weight', 'transformer.layers.7.mlp.proj.weight', 'transformer.layers.6.mlp.gate.weight', 'transformer.layers.1.mlp.fc.weight', 'transformer.layers.6.post_layernorm.weight', 'transformer.layers.23.attention.dense.weight', 'transformer.layers.8.mlp.fc.weight', 'transformer.layers.21.mlp.proj.weight', 'transformer.layers.21.input_layernorm.weight', 'transformer.layers.10.post_layernorm.weight', 'transformer.layers.26.attention.dense.weight', 'transformer.layers.0.attention.dense.weight', 'transformer.layers.31.attention.qkv.weight', 'transformer.layers.13.post_layernorm.weight', 'transformer.layers.23.input_layernorm.weight', 'transformer.layers.3.mlp.gate.weight', 'transformer.layers.27.mlp.proj.weight', 'transformer.layers.27.mlp.gate.weight', 'transformer.layers.9.mlp.proj.weight', 'transformer.layers.21.mlp.fc.weight', 'transformer.layers.18.mlp.fc.weight', 'transformer.layers.10.mlp.gate.weight', 'transformer.layers.0.attention.qkv.weight', 'transformer.layers.12.mlp.fc.weight', 'transformer.layers.29.mlp.gate.weight', 'transformer.layers.2.mlp.gate.weight', 'transformer.layers.24.attention.dense.weight', 'transformer.layers.22.attention.qkv.weight', 'transformer.layers.26.mlp.fc.weight', 'transformer.layers.18.mlp.proj.weight', 'transformer.ln_f.weight', 'transformer.layers.1.mlp.gate.weight', 'transformer.layers.2.input_layernorm.weight', 'transformer.layers.26.input_layernorm.weight', 'transformer.layers.14.mlp.proj.weight', 'transformer.layers.26.attention.qkv.weight', 'transformer.layers.24.attention.qkv.weight', 'transformer.layers.9.mlp.fc.weight', 'transformer.layers.11.attention.dense.weight', 'transformer.layers.29.post_layernorm.weight', 'transformer.layers.2.post_layernorm.weight', 'transformer.layers.11.mlp.proj.weight', 'transformer.layers.28.input_layernorm.weight', 'transformer.layers.15.mlp.proj.weight', 'transformer.layers.21.mlp.gate.weight', 'transformer.layers.16.attention.dense.weight', 'transformer.layers.26.post_layernorm.weight', 'transformer.layers.10.input_layernorm.weight', 'transformer.layers.1.attention.dense.weight', 'transformer.layers.23.mlp.gate.weight', 'lm_head.weight', 'transformer.layers.13.input_layernorm.weight', 'transformer.layers.7.mlp.gate.weight', 'transformer.layers.22.attention.dense.weight', 'transformer.layers.8.input_layernorm.weight', 'transformer.layers.15.attention.dense.weight', 'transformer.layers.25.mlp.gate.weight', 'transformer.layers.6.attention.dense.weight', 'transformer.layers.11.mlp.gate.weight', 'transformer.layers.28.mlp.fc.weight', 'transformer.layers.12.attention.qkv.weight', 'transformer.layers.22.mlp.fc.weight', 'transformer.layers.18.input_layernorm.weight', 'transformer.layers.24.mlp.gate.weight', 'transformer.layers.10.attention.qkv.weight', 'transformer.layers.19.mlp.fc.weight', 'transformer.layers.8.post_layernorm.weight', 'transformer.layers.22.post_layernorm.weight', 'transformer.layers.23.post_layernorm.weight', 'transformer.layers.27.post_layernorm.weight', 'transformer.layers.25.mlp.fc.weight', 'transformer.layers.6.input_layernorm.weight', 'transformer.layers.16.mlp.fc.weight', 'transformer.layers.16.attention.qkv.weight', 'transformer.layers.25.input_layernorm.weight', 'transformer.layers.4.mlp.gate.weight', 'transformer.layers.29.mlp.fc.weight', 'transformer.layers.13.attention.qkv.weight', 'transformer.layers.8.attention.qkv.weight', 'transformer.layers.23.mlp.fc.weight', 'transformer.layers.12.input_layernorm.weight', 'transformer.layers.15.input_layernorm.weight', 'transformer.layers.19.mlp.gate.weight', 'transformer.layers.24.mlp.proj.weight', 'transformer.layers.20.mlp.gate.weight', 'transformer.layers.11.mlp.fc.weight', 'transformer.layers.9.post_layernorm.weight', 'transformer.layers.14.mlp.fc.weight', 'transformer.layers.20.input_layernorm.weight', 'transformer.layers.25.mlp.proj.weight', 'transformer.layers.28.mlp.gate.weight', 'transformer.layers.13.mlp.gate.weight', 'transformer.layers.11.input_layernorm.weight', 'transformer.layers.5.post_layernorm.weight', 'transformer.layers.4.attention.qkv.weight', 'transformer.layers.30.mlp.gate.weight', 'transformer.layers.22.mlp.proj.weight', 'transformer.layers.22.mlp.gate.weight', 'transformer.layers.9.input_layernorm.weight', 'transformer.layers.3.post_layernorm.weight', 'transformer.layers.10.mlp.fc.weight', 'transformer.layers.23.attention.qkv.weight', 'transformer.layers.5.attention.dense.weight', 'transformer.layers.24.input_layernorm.weight', 'transformer.layers.14.mlp.gate.weight', 'transformer.layers.9.attention.dense.weight', 'transformer.layers.21.post_layernorm.weight', 'transformer.layers.7.attention.dense.weight', 'transformer.layers.14.input_layernorm.weight', 'transformer.layers.19.attention.qkv.weight', 'transformer.layers.1.post_layernorm.weight', 'transformer.layers.20.post_layernorm.weight', 'transformer.layers.17.attention.qkv.weight', 'transformer.layers.30.mlp.proj.weight', 'transformer.layers.1.input_layernorm.weight', 'transformer.layers.4.input_layernorm.weight', 'transformer.layers.7.attention.qkv.weight', 'transformer.layers.17.input_layernorm.weight', 'transformer.layers.2.mlp.fc.weight', 'transformer.layers.14.attention.qkv.weight', 'transformer.layers.18.attention.dense.weight', 'transformer.layers.6.mlp.proj.weight', 'transformer.layers.14.attention.dense.weight', 'transformer.layers.17.mlp.proj.weight', 'transformer.layers.19.input_layernorm.weight', 'transformer.layers.10.attention.dense.weight', 'transformer.layers.19.mlp.proj.weight', 'transformer.layers.0.mlp.fc.weight', 'transformer.layers.2.mlp.proj.weight', 'transformer.layers.8.mlp.gate.weight', 'transformer.layers.19.attention.dense.weight', 'transformer.layers.14.post_layernorm.weight', 'transformer.layers.4.post_layernorm.weight', 'transformer.layers.15.mlp.gate.weight', 'transformer.layers.28.mlp.proj.weight', 'transformer.layers.7.post_layernorm.weight', 'transformer.layers.29.mlp.proj.weight', 'transformer.layers.0.input_layernorm.weight', 'transformer.layers.5.mlp.proj.weight', 'transformer.layers.17.mlp.gate.weight', 'transformer.layers.31.post_layernorm.weight', 'transformer.layers.8.mlp.proj.weight', 'transformer.layers.16.mlp.proj.weight', 'transformer.layers.2.attention.qkv.weight', 'transformer.layers.6.attention.qkv.weight', 'transformer.layers.28.attention.qkv.weight', 'transformer.layers.30.mlp.fc.weight', 'transformer.layers.5.attention.qkv.weight', 'transformer.layers.0.post_layernorm.weight', 'transformer.layers.20.mlp.fc.weight', 'transformer.layers.11.post_layernorm.weight', 'transformer.layers.25.attention.qkv.weight', 'transformer.layers.4.mlp.proj.weight', 'transformer.layers.7.mlp.fc.weight', 'transformer.layers.18.attention.qkv.weight', 'transformer.layers.29.attention.dense.weight', 'transformer.layers.27.attention.qkv.weight', 'transformer.layers.31.mlp.fc.weight', 'transformer.layers.3.attention.qkv.weight', 'transformer.layers.18.mlp.gate.weight', 'transformer.layers.3.mlp.proj.weight', 'transformer.layers.9.mlp.gate.weight', 'transformer.layers.8.attention.dense.weight', 'transformer.layers.15.post_layernorm.weight', 'transformer.layers.17.post_layernorm.weight', 'transformer.layers.31.mlp.proj.weight', 'transformer.layers.25.attention.dense.weight', 'transformer.layers.2.attention.dense.weight', 'transformer.layers.12.mlp.gate.weight', 'transformer.layers.27.input_layernorm.weight', 'transformer.layers.16.mlp.gate.weight', 'transformer.layers.26.mlp.gate.weight', 'transformer.layers.23.mlp.proj.weight', 'transformer.layers.20.attention.qkv.weight', 'transformer.layers.5.mlp.gate.weight', 'transformer.layers.10.mlp.proj.weight'} Traceback (most recent call last): File "C:\Users\RayBe\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\RayBe\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\pythonprograms\llmstreaming.venv\Scripts\trtllm-build.exe__main__.py", line 7, in File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 497, in main parallel_build(source, build_config, args.output_dir, workers, File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 420, in parallel_build passed = build_and_save(rank, rank % workers, ckpt_dir, File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 392, in build_and_save engine = build(build_config, File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\commands\build.py", line 272, in build model.load(weights) File "F:\pythonprograms\llmstreaming.venv\lib\site-packages\tensorrt_llm\models\modeling_utils.py", line 341, in load raise RuntimeError(err_msg) RuntimeError: Provided tensor names are different from those expected by the engine. Expected but not provided tensors: {'transformer.layers.31.mlp.gate.weight', 'transformer.layers.13.mlp.fc.weight', 'transformer.layers.26.mlp.proj.weight', 'transformer.layers.15.mlp.fc.weight', 'transformer.layers.28.post_layernorm.weight', 'transformer.layers.31.input_layernorm.weight', 'transformer.layers.25.post_layernorm.weight', 'transformer.layers.19.post_layernorm.weight', 'transformer.layers.21.attention.dense.weight', 'transformer.layers.16.post_layernorm.weight', 'transformer.layers.30.input_layernorm.weight', 'transformer.layers.18.post_layernorm.weight', 'transformer.layers.31.attention.dense.weight', 'transformer.layers.21.attention.qkv.weight', 'transformer.layers.28.attention.dense.weight', 'transformer.layers.24.mlp.fc.weight', 'transformer.layers.20.attention.dense.weight', 'transformer.layers.27.mlp.fc.weight', 'transformer.layers.13.mlp.proj.weight', 'transformer.layers.17.attention.dense.weight', 'transformer.layers.24.post_layernorm.weight', 'transformer.layers.13.attention.dense.weight', 'transformer.layers.20.mlp.proj.weight', 'transformer.layers.30.post_layernorm.weight', 'transformer.layers.16.input_layernorm.weight', 'transformer.layers.29.attention.qkv.weight', 'transformer.layers.30.attention.dense.weight', 'transformer.layers.30.attention.qkv.weight', 'transformer.layers.29.input_layernorm.weight', 'transformer.layers.22.input_layernorm.weight', 'transformer.layers.17.mlp.fc.weight', 'transformer.layers.27.attention.dense.weight', 'transformer.layers.15.attention.qkv.weight', 'transformer.layers.23.attention.dense.weight', 'transformer.layers.21.mlp.proj.weight', 'transformer.layers.21.input_layernorm.weight', 'transformer.layers.26.attention.dense.weight', 'transformer.layers.31.attention.qkv.weight', 'transformer.layers.13.post_layernorm.weight', 'transformer.layers.23.input_layernorm.weight', 'transformer.layers.27.mlp.proj.weight', 'transformer.layers.27.mlp.gate.weight', 'transformer.layers.21.mlp.fc.weight', 'transformer.layers.18.mlp.fc.weight', 'transformer.layers.29.mlp.gate.weight', 'transformer.layers.24.attention.dense.weight', 'transformer.layers.22.attention.qkv.weight', 'transformer.layers.26.mlp.fc.weight', 'transformer.layers.18.mlp.proj.weight', 'transformer.ln_f.weight', 'transformer.layers.26.input_layernorm.weight', 'transformer.layers.14.mlp.proj.weight', 'transformer.layers.26.attention.qkv.weight', 'transformer.layers.24.attention.qkv.weight', 'transformer.layers.29.post_layernorm.weight', 'transformer.layers.28.input_layernorm.weight', 'transformer.layers.15.mlp.proj.weight', 'transformer.layers.21.mlp.gate.weight', 'transformer.layers.16.attention.dense.weight', 'transformer.layers.26.post_layernorm.weight', 'transformer.layers.23.mlp.gate.weight', 'transformer.layers.13.input_layernorm.weight', 'transformer.layers.31.mlp.fc.weight', 'transformer.layers.22.attention.dense.weight', 'transformer.layers.15.attention.dense.weight', 'transformer.layers.28.mlp.fc.weight', 'transformer.layers.25.mlp.gate.weight', 'transformer.layers.24.mlp.gate.weight', 'transformer.layers.22.mlp.fc.weight', 'transformer.layers.18.input_layernorm.weight', 'transformer.layers.19.mlp.fc.weight', 'transformer.layers.22.post_layernorm.weight', 'transformer.layers.23.post_layernorm.weight', 'transformer.layers.27.post_layernorm.weight', 'transformer.layers.25.mlp.fc.weight', 'transformer.layers.16.mlp.fc.weight', 'transformer.layers.25.input_layernorm.weight', 'transformer.layers.16.attention.qkv.weight', 'transformer.layers.29.mlp.fc.weight', 'transformer.layers.13.attention.qkv.weight', 'transformer.layers.23.mlp.fc.weight', 'transformer.layers.15.input_layernorm.weight', 'transformer.layers.19.mlp.gate.weight', 'transformer.layers.24.mlp.proj.weight', 'transformer.layers.20.mlp.gate.weight', 'transformer.layers.14.mlp.fc.weight', 'transformer.layers.20.input_layernorm.weight', 'transformer.layers.25.mlp.proj.weight', 'transformer.layers.28.mlp.gate.weight', 'transformer.layers.13.mlp.gate.weight', 'transformer.layers.30.mlp.gate.weight', 'transformer.layers.22.mlp.proj.weight', 'transformer.layers.22.mlp.gate.weight', 'transformer.layers.23.attention.qkv.weight', 'transformer.layers.24.input_layernorm.weight', 'transformer.layers.14.mlp.gate.weight', 'transformer.layers.21.post_layernorm.weight', 'transformer.layers.14.input_layernorm.weight', 'transformer.layers.19.attention.qkv.weight', 'transformer.layers.17.attention.qkv.weight', 'transformer.layers.30.mlp.proj.weight', 'transformer.layers.17.input_layernorm.weight', 'transformer.layers.14.attention.qkv.weight', 'transformer.layers.18.attention.dense.weight', 'transformer.layers.14.attention.dense.weight', 'transformer.layers.17.mlp.proj.weight', 'transformer.layers.19.input_layernorm.weight', 'transformer.layers.19.mlp.proj.weight', 'transformer.layers.19.attention.dense.weight', 'transformer.layers.14.post_layernorm.weight', 'transformer.layers.15.mlp.gate.weight', 'transformer.layers.28.mlp.proj.weight', 'transformer.layers.29.mlp.proj.weight', 'transformer.layers.17.mlp.gate.weight', 'transformer.layers.31.post_layernorm.weight', 'transformer.layers.16.mlp.proj.weight', 'transformer.layers.28.attention.qkv.weight', 'transformer.layers.30.mlp.fc.weight', 'transformer.layers.20.mlp.fc.weight', 'transformer.layers.25.attention.qkv.weight', 'transformer.layers.29.attention.dense.weight', 'transformer.layers.18.attention.qkv.weight', 'transformer.layers.27.attention.qkv.weight', 'transformer.layers.18.mlp.gate.weight', 'transformer.layers.15.post_layernorm.weight', 'transformer.layers.17.post_layernorm.weight', 'transformer.layers.31.mlp.proj.weight', 'transformer.layers.25.attention.dense.weight', 'transformer.layers.27.input_layernorm.weight', 'transformer.layers.16.mlp.gate.weight', 'transformer.layers.26.mlp.gate.weight', 'transformer.layers.23.mlp.proj.weight', 'transformer.layers.20.attention.qkv.weight', 'transformer.layers.20.post_layernorm.weight'}

(.venv) F:\pythonprograms\llmstreaming>