Open ttim opened 1 week ago
Hi @ttim , if my understanding is correct, the gelu_pytorch_tanh
should be equal to gelu
activation function, they are different implementation. Could you please share the error log when building Gemma-1.1?
@QiJune it fails at this line https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/layers/mlp.py#L49 , presumably because of the hf configuration of the model specifying gelu_pytorch_tanh
. I believe the fix is to add this alias here https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/functional.py#L5347
@ttim , Yes, I think so. Could you please submit a MR to fix it? Or you prefer to waiting for us to fix it?
@QiJune there are two issues here. Activation function issue I can fix myself. But apart from it from_hugging_face
is broken for Gemma models in other code path I can't really debug myself. It happens both for Gemma 1 and 1.1 (after activation function fix. Here's error on most current dev version:
AssertionError: Gemma only supports share_embedding_table
Even if this is fixed it fails with some error from.TensorRT about incompatible types.
@QiJune I've created PR for the activation function: https://github.com/NVIDIA/TensorRT-LLM/pull/1897
System Info
Model: https://huggingface.co/google/gemma-1.1-2b-it
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
GemmaForCausalLM.from_hugging_face().save_checkpoint()
API with for https://huggingface.co/google/gemma-1.1-2b-it model, this fails for 1.1 model but succeeds for 1.0 model (https://huggingface.co/google/gemma-2b-it)Expected behavior
Successfully working TRT-LLM engine
actual behavior
Either checkpoint (for 1.1 version) or engine (for 1.0 version) build fails
additional notes
I believe issue for 1.1 comes from
gelu_pytorch_tanh
activation function, I'm not sure what breaks build for 1.0