NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.71k stars 996 forks source link

Assertion failed: noRepeatNgramSize.value() > 0 #2442

Open krishnanpooja opened 1 week ago

krishnanpooja commented 1 week ago

System Info

GPU-A100, TensorRT-LLM version = tensorrt_llm-0.13.0.dev2024090300 Ubuntu machine.

Who can help?

hi @ncomly-nvidia , @byshiue ,

I want to set the 'no_repeat_ngram_size'=0 for mistral model. But I get the following assertion error:

RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: noRepeatNgramSize.value() > 0 (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/executor/samplingConfig.cpp:332)

As per the documentation the default value is 1 << 30, is there way to set the value to 0? If not, can this feature be added?

Information

Tasks

Reproduction

Setting no_repear_ngram_size=0 under SamplingParams for mistral model.

Expected behavior

User should be allowed to allowed to set this value to 0.

actual behavior

Getting assertion error.

additional notes

We want to set it to 0 like we do for pytorch-eager used for inference.

byshiue commented 1 week ago

Could you explain your motivation to setting it as 0 instead of 1 << 30? 1 << 30 should work equivalent to 0 and it is more friend to our kernel implementation.