NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.55k stars 969 forks source link

Why is the input of the model text_input instead of messages? #2193

Open powerpistn opened 1 month ago

powerpistn commented 1 month ago

Execute the following three steps to start the model: 1) python3 convert_checkpoint.py, 2) trtllm-build, 3) Use tritonserver to start the model. When requesting the localhost:8000/v2/models/ensemble/generate interface, the input is { "text_input": "Introduce yourself", "max_tokens": 2048 }, I want to use the following form: { "messages":[ {"role":"user", "content":"Introduce yourself" } ], "max_tokens":2048 }, is it available?

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

winstxnhdw commented 2 weeks ago

You’ll have to rewrite the ensemble to accept these parameters. It’s not worth the effort.