NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.72k stars 996 forks source link

AssertionError,error happens when checking starcoder's gelu_pytorch_tanh activation function. #946

Open nullxjx opened 10 months ago

nullxjx commented 10 months ago

System Info

model: bigcode/starcoderbase-3b Python 3.10.12 CUDA Version: 12.2 tensorrt_llm version: 0.8.0.dev2024011601 basic image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3

Who can help?

@juney-nvidia @kaiyux @Shixiaowei02 @Eddie-Wang1120

Information

Tasks

Reproduction

follow the guide to convert the weigths of starcoder.

Expected behavior

convert the weigths of starcoder successfully.

actual behavior

When I followed the instruction as described here, an error happened as below:

image

the command is

python3 build.py \
    --model_dir ./c-model/starcoderbase-3b/2-gpu \
    --remove_input_padding \
    --use_gpt_attention_plugin \
    --enable_context_fmha \
    --use_gemm_plugin \
    --parallel_build \
    --output_dir engines/starcoderbase-3b/fp16/2-gpu \
    --world_size 2

I find starcoder uses the "gelu_pytorch_tanh" activation function instead of classic gelu as described here

I checked the code from the latest main branch, but I cannot find gelu_pytorch_tanh defined anywhere.

企业微信截图_106b7526-a152-42a5-b294-c5952679b49c

It seems that tensorRT-LLM did not adapt to the starcoder model. But there are instruction readme here, I wonder if you have tested on starcoder before?

additional notes

none

nv-guomingz commented 1 week ago

would u please try our latest code base to see if the issue still exists?

And do u still have further issue or question now? If not, we'll close it soon.