NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.71k stars 996 forks source link

Error when convert Deekseek-V2-Lite model #2463

Open WhatGhost opened 3 days ago

WhatGhost commented 3 days ago

System Info

the offical docker env in docker/Dockerfile.multi of commit "c629546"

Who can help?

@byshiue @ncomly-nvidia I try to convert deepseek-v2-lite

python convert_checkpoint.py --model_dir /target/model_repo/DeepSeek-V2-Lite \
                            --output_dir /target/engine_repo/DeepSeek-V2-Lite-fp16 \
                            --dtype float16 \
                            --tp_size 1 \
                            --load_model_on_cpu

and met the following error. It seems that the q_lora_rank in Lite is None and is different from DeepSeek-V2

{'architecture': 'DeepseekV2ForCausalLM', 'dtype': 'float16', 'logits_type': 'float32', 'num_hidden_layers': 27, 'num_attention_heads': 16, 'hidden_size': 2048, 'intermediate_size': 10944, 'num_key_value_heads': 16, 'vocab_size': 102400, 'position_embedding_type': 'rope_gpt_neox', 'max_position_embeddings': 163840, 'hidden_act': 'swiglu', 'rotary_base': 10000, 'norm_epsilon': 1e-06, 'rotary_scaling': {'beta_fast': 32, 'beta_slow': 1, 'factor': 40, 'mscale': 0.707, 'mscale_all_dim': 0.707, 'original_max_position_embeddings': 4096, 'type': 'yarn'}, 'mapping': {'world_size': 1, 'tp_size': 1, 'pp_size': 1, 'moe_tp_size': 1, 'moe_ep_size': 1}, 'kv_lora_rank': 512, 'q_lora_rank': None, 'qk_nope_head_dim': 128, 'qk_rope_head_dim': 64, 'v_head_dim': 128, 'moe_num_experts': 64, 'moe_inter_size': 1408, 'moe_num_shared_experts': 2, 'moe_top_k': 6, 'moe_renorm_mode': <ExpertScaleNormalizationMode.NONE: 0>, 'moe_n_group': 1, 'moe_topk_group': 1, 'moe_routed_scaling_factor': 1.0}
Traceback (most recent call last):
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 222, in <module>
    main()
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 214, in main
    convert_and_save_hf(args)
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 190, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 149, in execute
    f(args, rank)
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 185, in convert_and_save_rank
    deepseekv2 = DeepseekV2ForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/model.py", line 271, in from_hugging_face
    deepseek = cls.from_config(pretrained_config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 617, in from_config
    return cls(config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 577, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/model.py", line 222, in __init__
    transformer = DeepseekV2Model(config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/model.py", line 167, in __init__
    self.layers = DecoderLayerList(DeepseekV2DecoderLayer, config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 498, in __init__
    super().__init__([cls(config, idx) for idx in self.layer_list])
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 498, in <listcomp>
    super().__init__([cls(config, idx) for idx in self.layer_list])
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/model.py", line 48, in __init__
    self.attention = DeepseekV2Attention(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/attention.py", line 1934, in __init__
    q_lora_rank + kv_lora_rank + qk_rope_head_dim,
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

I try to add the following code to tensorrt_llm/layers/attention.py and met new error

if self.q_lora_rank is None:
            self.q_lora_rank = hidden_size
            q_lora_rank = hidden_size

new error

Traceback (most recent call last):
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 222, in <module>
    main()
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 214, in main
    convert_and_save_hf(args)
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 190, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 149, in execute
    f(args, rank)
  File "/target/trtllm_1118/TensorRT-LLM/examples/deepseek_v2/convert_checkpoint.py", line 185, in convert_and_save_rank
    deepseekv2 = DeepseekV2ForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/model.py", line 272, in from_hugging_face
    weights = convert_deepseekv2(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/convert.py", line 543, in convert_deepseekv2
    convert_layer(l)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/convert.py", line 260, in convert_layer
    q_a_proj_weight = get_weight(model_params,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/deepseek_v2/convert.py", line 217, in get_weight
    if config[prefix + postfix].dtype != dtype:
KeyError: 'model.layers.0.self_attn.q_a_proj.weight'

It seems trtllm does not support Deepseek-V2-Lite.How can i solve this error.

Thanks very much!

Information

Tasks

Reproduction

see above

Expected behavior

see above

actual behavior

see above

additional notes

nothing

zhangts20 commented 2 days ago

it seams that the lite is supported in the last commit: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/deepseek_v2/convert.py#L355, you can try it.

WhatGhost commented 2 days ago

it seams that the lite is supported in the last commit: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/deepseek_v2/convert.py#L355, you can try it.

Thanks! i see it. I will try the latest commit