NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.76k stars 1k forks source link

qwen2_1.5b+tp4 convert_checkpoint failed #2310

Open sun2011yao opened 1 month ago

sun2011yao commented 1 month ago

System Info

cpu: x86_64 gpu: nvidia a100

Who can help?

No response

Information

Tasks

Reproduction

command: python convert_checkpoint.py --model_dir ./Qwen2-1.5B \ --tp_size 4 \ --output_dir ./qwen2_1.5b_checkpoint \ --dtype float16

erro: Traceback (most recent call last): File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 309, in main() File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 301, in main convert_and_save_hf(args) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 257, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 264, in execute f(args, rank) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 247, in convert_and_save_rank qwen = QWenForCausalLM.from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 316, in from_hugging_face weights = load_weights_from_hf_model(hf_model, config) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1239, in load_weights_from_hf_model weights = convert_hf_qwen( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 756, in convert_hf_qwen k_bias = dup_kv_weight(k_bias, num_key_value_heads, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 526, in dup_kv_weight v.shape[1])

Expected behavior

success

actual behavior

n

additional notes

n

jershi425 commented 1 month ago

@sun2011yao Could you please let me know which version of TRT-LLM are you using?

sun2011yao commented 1 month ago

@sun2011yao Could you please let me know which version of TRT-LLM are you using?

0.13.0

sun2011yao commented 1 month ago
def dup_kv_bias(v, num_head, tp_size):
      assert tp_size % num_head == 0
      reps = tp_size // num_head
      head_size = v.shape[0] // num_head
      v = v.reshape(num_head, head_size)[:, None, :].expand(num_head, reps, head_size)
      return v.reshape(num_head * reps * head_size).clone().detach() 

I replace dup_kv_weight func with above dup_bias_func, it can fix this problem.

github-actions[bot] commented 11 hours ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."