AssertionError，error happens when checking starcoder's gelu_pytorch_tanh activation function.

System Info

model: bigcode/starcoderbase-3b Python 3.10.12 CUDA Version: 12.2 tensorrt_llm version: 0.8.0.dev2024011601 basic image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3

Who can help?

@juney-nvidia @kaiyux @Shixiaowei02 @Eddie-Wang1120

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

follow the guide to convert the weigths of starcoder.

Expected behavior

convert the weigths of starcoder successfully.

actual behavior

When I followed the instruction as described here, an error happened as below:

the command is

python3 build.py \
    --model_dir ./c-model/starcoderbase-3b/2-gpu \
    --remove_input_padding \
    --use_gpt_attention_plugin \
    --enable_context_fmha \
    --use_gemm_plugin \
    --parallel_build \
    --output_dir engines/starcoderbase-3b/fp16/2-gpu \
    --world_size 2

I find starcoder uses the "gelu_pytorch_tanh" activation function instead of classic gelu as described here

I checked the code from the latest main branch, but I cannot find gelu_pytorch_tanh defined anywhere.

企业微信截图_106b7526-a152-42a5-b294-c5952679b49c

It seems that tensorRT-LLM did not adapt to the starcoder model. But there are instruction readme here, I wonder if you have tested on starcoder before?

additional notes

none

NVIDIA / TensorRT-LLM