Support Gemma 1.1 model

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

7.49k stars 813 forks source link

Support Gemma 1.1 model #1889

Open ttim opened 1 week ago

ttim commented 1 week ago

System Info

Model: https://huggingface.co/google/gemma-1.1-2b-it

Who can help?

@byshiue

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Use GemmaForCausalLM.from_hugging_face().save_checkpoint() API with for https://huggingface.co/google/gemma-1.1-2b-it model, this fails for 1.1 model but succeeds for 1.0 model (https://huggingface.co/google/gemma-2b-it)
Use trt-llm build tool to build an engine, this fails for 1.0 model

Expected behavior

Successfully working TRT-LLM engine

actual behavior

Either checkpoint (for 1.1 version) or engine (for 1.0 version) build fails

additional notes

I believe issue for 1.1 comes from gelu_pytorch_tanh activation function, I'm not sure what breaks build for 1.0

QiJune commented 1 week ago

Hi @ttim , if my understanding is correct, the gelu_pytorch_tanh should be equal to gelu activation function, they are different implementation. Could you please share the error log when building Gemma-1.1?

ttim commented 1 week ago

@QiJune it fails at this line https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/layers/mlp.py#L49 , presumably because of the hf configuration of the model specifying gelu_pytorch_tanh. I believe the fix is to add this alias here https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/functional.py#L5347

QiJune commented 1 week ago

@ttim ， Yes, I think so. Could you please submit a MR to fix it? Or you prefer to waiting for us to fix it?

ttim commented 1 week ago

@QiJune there are two issues here. Activation function issue I can fix myself. But apart from it from_hugging_face is broken for Gemma models in other code path I can't really debug myself. It happens both for Gemma 1 and 1.1 (after activation function fix. Here's error on most current dev version:

AssertionError: Gemma only supports share_embedding_table

Even if this is fixed it fails with some error from.TensorRT about incompatible types.

ttim commented 1 week ago

@QiJune I've created PR for the activation function: https://github.com/NVIDIA/TensorRT-LLM/pull/1897