Closed jxchenus closed 3 weeks ago
Hi @jxchenus, can you check this PR #2232?
Hi @lkm2835 Thank you for looking into this! I looked at your PR. Unfortunately I don't believe it addresses the issue I was seeing. In my case, my model does not contain weights for lm_head.weight
, and the following warning is logged in model_weights_loader.py
's check_share_embedding
function:
"lm_head.weight and transformer.vocab_embedding.weight are not identical..."
So in my test case, modeling_utils.py
's check_share_embedding
is not in the code path.
Hi @jxchenus, thanks for the feedback. Currently, the ModelWeightsLoader
cannot handle the case for shared embedding.
This line need to get updated for your case. It should add another condition when one of vocab_embedding
and lm_head
is None.
Let me fix it and update here once the fix got merged into this repo.
Thank you @Barry-Delaney ! I verified the patch provided by Vivian Chen @Nvdia.
I still have a question about this line. Why do the weights need to be cloned?
This is a leftover from previous commits to keep the postprocess()
function works under similar cases, if they share the same portion of weights with no --use_embedding_sharing
enabled, the function will duplicate it.
This will be removed later.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
Who can help?
@kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
We have a customized llama model with shared embeddings. We were able to convert the model with TensorRT-LLM v0.9.0
convert-checkpoint.py
script and "use_embedding_sharing" option:python /opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/examples/llama/convert_checkpoint.py \ --model_dir $CONTAINER_LLM_CKPT_DIR \ --output_dir $CONTAINER_LLM_CKPT_DIR/ckpt_tp$LLM_TP_DEGREE/ \ --tp_size $LLM_TP_DEGREE \ --dtype bfloat16 \ --use_embedding_sharing \ --use_parallel_embedding
When we use the same command on the same model in TensorRT-LLM main branch, we find the following warning:
"lm_head.weight and transformer.vocab_embedding.weight are not identical..."
logged in
check_share_embedding
function. This function requires thatlm_head_weight
andvocab_embed_weights
are both available and are identical, but in our caselm_head_weight
is None; therefore,check_share_embedding
returns False.Furthermore, if I modify the above function to return True when
lm_head_weight
is None, aKeyError
is thrown from this line. This is because of this line.Finally, a question about this line. Why does
weights
need to be cloned?Expected behavior
convert_checkpoint.py
should honor the behavior as v0.9.0.actual behavior
convert_checkpoint.py
does not honor embedding sharing even when--use_embedding_sharing
option is specified.additional notes
Also tried the above command with: TRTLLM_DISABLE_UNIFIED_CONVERTER=1 and got a different error: Traceback (most recent call last): File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/examples/llama/convert_checkpoint.py", line 497, in
main()
File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/examples/llama/convert_checkpoint.py", line 489, in main
convert_and_save_hf(args)
File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/examples/llama/convert_checkpoint.py", line 431, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/examples/llama/convert_checkpoint.py", line 438, in execute
f(args, rank)
File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/examples/llama/convert_checkpoint.py", line 417, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/tensorrt_llm/models/llama/model.py", line 373, in from_hugging_face
check_share_embedding(weights, config)
File "/opt/amazon/alexa_triton_inference_engine/NeMoRT-TensorRT-LLM/tensorrt_llm/models/modeling_utils.py", line 1290, in check_share_embedding
if (weights["lm_head.weight"] -
TypeError: unsupported operand type(s) for -: 'NoneType' and 'Tensor'