Open sun2011yao opened 3 weeks ago
@sun2011yao Could you please let me know which version of TRT-LLM are you using?
@sun2011yao Could you please let me know which version of TRT-LLM are you using?
0.13.0
def dup_kv_bias(v, num_head, tp_size):
assert tp_size % num_head == 0
reps = tp_size // num_head
head_size = v.shape[0] // num_head
v = v.reshape(num_head, head_size)[:, None, :].expand(num_head, reps, head_size)
return v.reshape(num_head * reps * head_size).clone().detach()
I replace dup_kv_weight func with above dup_bias_func, it can fix this problem.
System Info
cpu: x86_64 gpu: nvidia a100
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
command: python convert_checkpoint.py --model_dir ./Qwen2-1.5B \ --tp_size 4 \ --output_dir ./qwen2_1.5b_checkpoint \ --dtype float16
erro: Traceback (most recent call last): File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 309, in
main()
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 301, in main
convert_and_save_hf(args)
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 257, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 264, in execute
f(args, rank)
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 247, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 316, in from_hugging_face
weights = load_weights_from_hf_model(hf_model, config)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1239, in load_weights_from_hf_model
weights = convert_hf_qwen(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 756, in convert_hf_qwen
k_bias = dup_kv_weight(k_bias, num_key_value_heads,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 526, in dup_kv_weight
v.shape[1])
Expected behavior
success
actual behavior
n
additional notes
n