Franc-Z / QWen1.5_TensorRT-LLM

Optimize QWen1.5 models with TensorRT-LLM
Apache License 2.0
16 stars 3 forks source link

error when building the engine, post_layernorm weight not found #6

Closed zhangfuwen closed 4 months ago

zhangfuwen commented 4 months ago
python3 QWen1.5_TensorRT-LLM/convert_checkpoint.py --model_dir Qwen1.5-1.8B-Chat --output_dir Qwen1.5-1.8B-Chat-ckpt

trtllm-build --checkpoint_dir ./Qwen1.5-1.8B-Chat-ckpt \
            --output_dir ./tmp/qwen/7B/trt_engines/fp16/1-gpu \
            --gemm_plugin float16
[TensorRT-LLM] TensorRT-LLM version: 0.9.0
[05/23/2024-20:46:34] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set gemm_plugin to float16.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set lookup_plugin to None.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set lora_plugin to None.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set moe_plugin to float16.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set mamba_conv1d_plugin to float16.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set context_fmha to True.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set paged_kv_cache to True.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set remove_input_padding to True.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set multi_block_mode to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set enable_xqa to True.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set tokens_per_block to 128.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set multiple_profiles to False.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set paged_state to True.
[05/23/2024-20:46:34] [TRT-LLM] [I] Set streamingllm to False.
[05/23/2024-20:46:34] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_size*max_input_len. 
It may not be optimal to set max_num_tokens=max_batch_size*max_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads.
[05/23/2024-20:46:34] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width. 

Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/tensorrt_llm/commands/build.py", line 440, in main
    parallel_build(source, build_config, args.output_dir, workers,
  File "/usr/local/lib/python3.9/dist-packages/tensorrt_llm/commands/build.py", line 332, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/usr/local/lib/python3.9/dist-packages/tensorrt_llm/commands/build.py", line 291, in build_and_save
    engine = build_model(build_config,
  File "/usr/local/lib/python3.9/dist-packages/tensorrt_llm/commands/build.py", line 268, in build_model
    model = load_model(rank_config, ckpt_dir, model_cls)
  File "/usr/local/lib/python3.9/dist-packages/tensorrt_llm/models/modeling_utils.py", line 1041, in load_model
    model.load(weights)
  File "/usr/local/lib/python3.9/dist-packages/tensorrt_llm/models/modeling_utils.py", line 388, in load
    assert expected_names.issubset(
AssertionError: Expected but not provided tensors:{'transformer.layers.18.post_layernorm.weight', 'transformer.layers.4.post_layernorm.weight', 'transformer.ln_f.weight', 'transformer.layers.13.post_layernorm.weight', 'transformer.layers.12.post_layernorm.weight', 'transformer.layers.15.post_layernorm.weight', 'transformer.layers.14.post_layernorm.weight', 'transformer.layers.19.post_layernorm.weight', 'transformer.layers.5.post_layernorm.weight', 'transformer.layers.22.post_layernorm.weight', 'transformer.layers.6.post_layernorm.weight', 'transformer.layers.3.post_layernorm.weight', 'transformer.layers.1.post_layernorm.weight', 'transformer.layers.2.post_layernorm.weight', 'transformer.layers.21.post_layernorm.weight', 'transformer.layers.16.post_layernorm.weight', 'transformer.layers.8.post_layernorm.weight', 'transformer.layers.10.post_layernorm.weight', 'transformer.layers.11.post_layernorm.weight', 'transformer.layers.7.post_layernorm.weight', 'transformer.layers.9.post_layernorm.weight', 'transformer.layers.0.post_layernorm.weight', 'transformer.layers.20.post_layernorm.weight', 'transformer.layers.23.post_layernorm.weight', 'transformer.layers.17.post_layernorm.weight'}
zhangfuwen commented 4 months ago

as I debug it, existing names include :

provided  names: {'transformer.layers.18.attention.qkv.bias', 'transformer.layers.8.mlp.gate.weight', 'transformer.layers.2.mlp.gate.weight', 'transformer.layers.0.post_attention_layernorm.weight', 'transformer.layers.17.attention.qkv.weight', 'transformer.layers.2.post_attention_layernorm.weight', 'transformer.layers.21.attention.dense.weight', 'transformer.layers.20.mlp.proj.weight', 'transformer.layers.0.input_layernorm.weight', 'transformer.layers.11.mlp.proj.weight', 'transformer.layers.18.mlp.fc.weight', 'transformer.layers.17.attention.qkv.bias', 'transformer.layers.18.post_attention_layernorm.weight', 'transformer.layers.12.attention.dense.weight', 'transformer.layers.18.attention.dense.weight', 'transformer.layers.4.attention.dense.weight', 'transformer.layers.7.attention.qkv.bias', 'transformer.layers.2.mlp.proj.weight', 'transformer.layers.11.attention.qkv.weight', 'transformer.layers.0.mlp.fc.weight', 'transformer.layers.12.attention.qkv.bias', 'transformer.layers.22.mlp.gate.weight', 'transformer.layers.15.attention.dense.weight', 'transformer.layers.13.mlp.fc.weight', 'transformer.layers.7.post_attention_layernorm.weight', 'transformer.layers.20.mlp.fc.weight', 'transformer.layers.1.input_layernorm.weight', 'transformer.layers.7.attention.dense.weight', 'transformer.layers.19.attention.qkv.bias', 'transformer.layers.22.input_layernorm.weight', 'transformer.layers.4.mlp.fc.weight', 'transformer.layers.7.attention.qkv.weight', 'transformer.layers.14.post_attention_layernorm.weight', 'transformer.layers.1.attention.dense.weight', 'transformer.layers.14.attention.qkv.weight', 'transformer.layers.2.attention.dense.weight', 'transformer.layers.15.mlp.proj.weight', 'transformer.layers.17.input_layernorm.weight', 'transformer.layers.20.mlp.gate.weight', 'transformer.layers.4.mlp.proj.weight', 'transformer.layers.6.post_attention_layernorm.weight', 'transformer.layers.3.mlp.gate.weight', 'transformer.layers.7.input_layernorm.weight', 'transformer.layers.10.attention.qkv.bias', 'transformer.layers.23.mlp.fc.weight', 'transformer.layers.8.attention.qkv.weight', 'transformer.layers.16.attention.dense.weight', 'transformer.layers.3.attention.dense.weight', 'transformer.layers.18.mlp.gate.weight', 'transformer.layers.1.mlp.gate.weight', 'transformer.layers.12.mlp.fc.weight', 'transformer.layers.3.attention.qkv.bias', 'transformer.layers.5.mlp.gate.weight', 'transformer.layers.19.mlp.fc.weight', 'transformer.layers.21.attention.qkv.bias', 'transformer.layers.6.input_layernorm.weight', 'transformer.layers.7.mlp.fc.weight', 'transformer.vocab_embedding.weight', 'transformer.layers.16.mlp.fc.weight', 'transformer.layers.8.attention.dense.weight', 'transformer.layers.9.mlp.proj.weight', 'transformer.layers.20.input_layernorm.weight', 'transformer.layers.1.mlp.fc.weight', 'transformer.layers.20.post_attention_layernorm.weight', 'transformer.layers.5.mlp.fc.weight', 'transformer.layers.19.input_layernorm.weight', 'transformer.layers.18.mlp.proj.weight', 'transformer.layers.4.mlp.gate.weight', 'transformer.layers.14.mlp.proj.weight', 'transformer.layers.19.post_attention_layernorm.weight', 'transformer.layers.3.mlp.proj.weight', 'transformer.layers.0.mlp.proj.weight', 'transformer.layers.23.attention.qkv.bias', 'transformer.layers.10.mlp.fc.weight', 'transformer.layers.14.attention.dense.weight', 'transformer.layers.21.mlp.proj.weight', 'transformer.layers.9.input_layernorm.weight', 'transformer.layers.23.mlp.proj.weight', 'transformer.layers.6.mlp.fc.weight', 'transformer.layers.6.mlp.proj.weight', 'transformer.layers.17.attention.dense.weight', 'transformer.layers.20.attention.qkv.bias', 'transformer.layers.7.mlp.proj.weight', 'transformer.layers.13.input_layernorm.weight', 'transformer.layers.15.attention.qkv.bias', 'transformer.layers.22.attention.qkv.bias', 'transformer.layers.21.mlp.fc.weight', 'transformer.layers.8.input_layernorm.weight', 'transformer.layers.7.mlp.gate.weight', 'transformer.layers.12.mlp.gate.weight', 'transformer.layers.3.post_attention_layernorm.weight', 'transformer.layers.1.mlp.proj.weight', 'transformer.layers.0.attention.qkv.bias', 'transformer.layers.13.attention.dense.weight', 'transformer.layers.10.attention.dense.weight', 'transformer.layers.16.mlp.proj.weight', 'transformer.layers.9.mlp.gate.weight', 'transformer.layers.14.input_layernorm.weight', 'transformer.layers.14.mlp.fc.weight', 'transformer.layers.10.mlp.proj.weight', 'transformer.layers.23.mlp.gate.weight', 'transformer.layers.8.post_attention_layernorm.weight', 'transformer.layers.14.mlp.gate.weight', 'transformer.layers.10.mlp.gate.weight', 'transformer.layers.23.attention.dense.weight', 'transformer.layers.16.input_layernorm.weight', 'transformer.layers.16.post_attention_layernorm.weight', 'transformer.layers.15.post_attention_layernorm.weight', 'transformer.norm.weight', 'transformer.layers.15.attention.qkv.weight', 'transformer.layers.11.attention.qkv.bias', 'transformer.layers.13.attention.qkv.bias', 'transformer.layers.11.post_attention_layernorm.weight', 'transformer.layers.17.mlp.gate.weight', 'transformer.layers.11.input_layernorm.weight', 'transformer.layers.20.attention.qkv.weight', 'transformer.layers.18.input_layernorm.weight', 'transformer.layers.22.attention.dense.weight', 'transformer.layers.17.mlp.proj.weight', 'transformer.layers.5.input_layernorm.weight', 'transformer.layers.8.attention.qkv.bias', 'transformer.layers.14.attention.qkv.bias', 'transformer.layers.8.mlp.proj.weight', 'transformer.layers.13.mlp.proj.weight', 'transformer.layers.16.attention.qkv.bias', 'transformer.layers.2.attention.qkv.bias', 'transformer.layers.22.attention.qkv.weight', 'transformer.layers.12.post_attention_layernorm.weight', 'transformer.layers.23.post_attention_layernorm.weight', 'transformer.layers.22.post_attention_layernorm.weight', 'transformer.layers.6.attention.qkv.weight', 'transformer.layers.19.attention.dense.weight', 'transformer.layers.11.mlp.gate.weight', 'transformer.layers.13.attention.qkv.weight', 'transformer.layers.19.mlp.proj.weight', 'transformer.layers.2.input_layernorm.weight', 'transformer.layers.4.post_attention_layernorm.weight', 'transformer.layers.3.input_layernorm.weight', 'transformer.layers.9.post_attention_layernorm.weight', 'lm_head.weight', 'transformer.layers.10.post_attention_layernorm.weight', 'transformer.layers.22.mlp.fc.weight', 'transformer.layers.5.attention.dense.weight', 'transformer.layers.6.mlp.gate.weight', 'transformer.layers.5.mlp.proj.weight', 'transformer.layers.3.attention.qkv.weight', 'transformer.layers.5.attention.qkv.weight', 'transformer.layers.1.attention.qkv.bias', 'transformer.layers.15.input_layernorm.weight', 'transformer.layers.23.attention.qkv.weight', 'transformer.layers.17.mlp.fc.weight', 'transformer.layers.9.attention.qkv.bias', 'transformer.layers.15.mlp.fc.weight', 'transformer.layers.2.attention.qkv.weight', 'transformer.layers.5.attention.qkv.bias', 'transformer.layers.11.attention.dense.weight', 'transformer.layers.4.input_layernorm.weight', 'transformer.layers.16.mlp.gate.weight', 'transformer.layers.21.input_layernorm.weight', 'transformer.layers.20.attention.dense.weight', 'transformer.layers.2.mlp.fc.weight', 'transformer.layers.21.mlp.gate.weight', 'transformer.layers.15.mlp.gate.weight', 'transformer.layers.22.mlp.proj.weight', 'transformer.layers.0.attention.qkv.weight', 'transformer.layers.12.attention.qkv.weight', 'transformer.layers.8.mlp.fc.weight', 'transformer.layers.17.post_attention_layernorm.weight', 'transformer.layers.3.mlp.fc.weight', 'transformer.layers.6.attention.qkv.bias', 'transformer.layers.13.mlp.gate.weight', 'transformer.layers.11.mlp.fc.weight', 'transformer.layers.10.attention.qkv.weight', 'transformer.layers.19.attention.qkv.weight', 'transformer.layers.5.post_attention_layernorm.weight', 'transformer.layers.21.post_attention_layernorm.weight', 'transformer.layers.1.post_attention_layernorm.weight', 'transformer.layers.23.input_layernorm.weight', 'transformer.layers.12.mlp.proj.weight', 'transformer.layers.4.attention.qkv.weight', 'transformer.layers.0.mlp.gate.weight', 'transformer.layers.0.attention.dense.weight', 'transformer.layers.18.attention.qkv.weight', 'transformer.layers.1.attention.qkv.weight', 'transformer.layers.19.mlp.gate.weight', 'transformer.layers.12.input_layernorm.weight', 'transformer.layers.9.attention.qkv.weight', 'transformer.layers.9.mlp.fc.weight', 'transformer.layers.21.attention.qkv.weight', 'transformer.layers.9.attention.dense.weight', 'transformer.layers.16.attention.qkv.weight', 'transformer.layers.4.attention.qkv.bias', 'transformer.layers.6.attention.dense.weight', 'transformer.layers.13.post_attention_layernorm.weight', 'transformer.layers.10.input_layernorm.weight'}
Franc-Z commented 4 months ago

the reason is in HF's model, the name is "transformer.layers.XX.post_attention_layernorm" but not "transformer.layers.XX.post_layernorm", so it should be care about.