Open sun2011yao opened 1 month ago
@sun2011yao Could you please let me know which version of TRT-LLM are you using?
@sun2011yao Could you please let me know which version of TRT-LLM are you using?
0.13.0
def dup_kv_bias(v, num_head, tp_size):
assert tp_size % num_head == 0
reps = tp_size // num_head
head_size = v.shape[0] // num_head
v = v.reshape(num_head, head_size)[:, None, :].expand(num_head, reps, head_size)
return v.reshape(num_head * reps * head_size).clone().detach()
I replace dup_kv_weight func with above dup_bias_func, it can fix this problem.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
cpu: x86_64 gpu: nvidia a100
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
command: python convert_checkpoint.py --model_dir ./Qwen2-1.5B \ --tp_size 4 \ --output_dir ./qwen2_1.5b_checkpoint \ --dtype float16
erro: Traceback (most recent call last): File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 309, in
main()
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 301, in main
convert_and_save_hf(args)
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 257, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 264, in execute
f(args, rank)
File "/root/TensorRT-LLM-master/examples/qwen/convert_checkpoint.py", line 247, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 316, in from_hugging_face
weights = load_weights_from_hf_model(hf_model, config)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1239, in load_weights_from_hf_model
weights = convert_hf_qwen(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 756, in convert_hf_qwen
k_bias = dup_kv_weight(k_bias, num_key_value_heads,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 526, in dup_kv_weight
v.shape[1])
Expected behavior
success
actual behavior
n
additional notes
n