NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.58k stars 974 forks source link

qwen2_1.5b+tp4 convert_checkpoint failed #2310

Open sun2011yao opened 3 weeks ago

sun2011yao commented 3 weeks ago

System Info

cpu: x86_64 gpu: nvidia a100

Who can help?

No response

Information

Tasks

Reproduction

command: python convert_checkpoint.py --model_dir ./Qwen2-1.5B \ --tp_size 4 \ --output_dir ./qwen2_1.5b_checkpoint \ --dtype float16

erro: Traceback (most recent call last): File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 309, in main() File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 301, in main convert_and_save_hf(args) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 257, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 264, in execute f(args, rank) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 247, in convert_and_save_rank qwen = QWenForCausalLM.from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 316, in from_hugging_face weights = load_weights_from_hf_model(hf_model, config) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1239, in load_weights_from_hf_model weights = convert_hf_qwen( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 756, in convert_hf_qwen k_bias = dup_kv_weight(k_bias, num_key_value_heads, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 526, in dup_kv_weight v.shape[1])

Expected behavior

success

actual behavior

n

additional notes

n

jershi425 commented 3 weeks ago

@sun2011yao Could you please let me know which version of TRT-LLM are you using?

sun2011yao commented 2 weeks ago

@sun2011yao Could you please let me know which version of TRT-LLM are you using?

0.13.0

sun2011yao commented 2 weeks ago
def dup_kv_bias(v, num_head, tp_size):
      assert tp_size % num_head == 0
      reps = tp_size // num_head
      head_size = v.shape[0] // num_head
      v = v.reshape(num_head, head_size)[:, None, :].expand(num_head, reps, head_size)
      return v.reshape(num_head * reps * head_size).clone().detach() 

I replace dup_kv_weight func with above dup_bias_func, it can fix this problem.