qwen2_1.5b+tp4 convert_checkpoint failed

sun2011yao commented 1 month ago

System Info

cpu: x86_64 gpu: nvidia a100

Who can help?

No response

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

command: python convert_checkpoint.py --model_dir ./Qwen2-1.5B \ --tp_size 4 \ --output_dir ./qwen2_1.5b_checkpoint \ --dtype float16

erro: Traceback (most recent call last): File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 309, in main() File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 301, in main convert_and_save_hf(args) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 257, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 264, in execute f(args, rank) File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 247, in convert_and_save_rank qwen = QWenForCausalLM.from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 316, in from_hugging_face weights = load_weights_from_hf_model(hf_model, config) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1239, in load_weights_from_hf_model weights = convert_hf_qwen( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 756, in convert_hf_qwen k_bias = dup_kv_weight(k_bias, num_key_value_heads, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 526, in dup_kv_weight v.shape[1])

Expected behavior

success

actual behavior

n

additional notes

n

jershi425 commented 1 month ago

@sun2011yao Could you please let me know which version of TRT-LLM are you using?

sun2011yao commented 1 month ago

@sun2011yao Could you please let me know which version of TRT-LLM are you using?

0.13.0

sun2011yao commented 1 month ago

def dup_kv_bias(v, num_head, tp_size):
      assert tp_size % num_head == 0
      reps = tp_size // num_head
      head_size = v.shape[0] // num_head
      v = v.reshape(num_head, head_size)[:, None, :].expand(num_head, reps, head_size)
      return v.reshape(num_head * reps * head_size).clone().detach()

I replace dup_kv_weight func with above dup_bias_func, it can fix this problem.

github-actions[bot] commented 11 hours ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

NVIDIA / TensorRT-LLM