NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.82k stars 1.01k forks source link

failed to use "stop_words_list" for tensorrt-llm==0.9.0 #1642

Closed AGI-player closed 7 hours ago

AGI-player commented 6 months ago

i use GenerationExecutorWorker for web service, using the parameters stop_words_list = [["hello, yes"]] by modifying the as_inference_request function in exectutor.py as follow:

image image

the ir parameter as follow: image then failed

image
byshiue commented 6 months ago

Please follow the issue template to share the full end to end reproduced steps. Thank you for cooperation.

AGI-player commented 6 months ago

Please follow the issue template to share the full end to end reproduced steps. Thank you for cooperation.

the trt engine was built with: trtllm-build --gemm_plugin float16 --max_batch_size=128 --max_input_len=8192 --max_output_len=0 --gpt_attention_plugin float16 --paged_kv_cache enable --remove_input_padding enable --context_fmha enable --max_num_tokens 104448

then use the TensorRT-LLM/examples/apps/fastapi_server.py for web service as follow: python3 -m apps.fastapi_server path/to/engine/ path/to/tokenizer --port 8001

for parameters setting (temperature and stop words list), i modified the TensorRT-LLM/examples/apps/fastapi_server.py file image

and tensorrt_llm/exectutor.py file image

if i didn't pass the stop_words_list, it works well.

image image

It failed when i use the stop words list

image
Superjomn commented 6 months ago

The stop_words_list is not supported well in 0.9.0, maybe you can try the latest main branch, we have refactored the GenerationExecutor, and the stop_words are supported.

AGI-player commented 6 months ago

The stop_words_list is not supported well in 0.9.0, maybe you can try the latest main branch, we have refactored the GenerationExecutor, and the stop_words are supported.

I update the trt version to 0.11.0.dev2024052100,it doesn't work...

fan-niu commented 6 months ago

@Superjomn @byshiue Same question, can you give an example of successful use of stop_words or stop_words_list? Thank you, I am currently using the service started by tensorrtllm_backend, the commit number is 75b0964, and the corresponding tensorrtllm version number is f430a4b

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

nv-guomingz commented 2 weeks ago

Hi @fan-niu do u still have further issue or question now? If not, we'll close it soon.