Open KuntaiDu opened 3 months ago
Hi @KuntaiDu , please see the official location of the config.pbtxt
file for v0.11 at here: https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.11.0/all_models/inflight_batcher_llm.
Before you launch the tritonserver, you'll need to set several parameters, please follow the documents of the TensorRT-LLM backend repo and feel free to let us know if there are any questions. Thanks.
Hello @KuntaiDu, for the ci benchmark, would it be possible to provide the ci script that you're using? I could find the regression page with results https://buildkite.com/vllm/performance-benchmark/builds/4068#_ but couldn't find the ci script. I can help fix that script. But basically, to get a good run with good settings in the ci script, could you:
Incorporated into vLLM PR. Please close as "done'
@KuntaiDu If you have no further question, we will close it in a week.
System Info
I am working on the benchmarking suite in vLLM team, and now trying to run TensorRT-LLM for comparison. I am relying on this github repo (https://github.com/neuralmagic/tensorrt-demo) to serve the LLM, which contains several
config.pbtxt
files that specifies the batch size, max token length, etc and will be used for triton inference server. However, this repo is based on versionr24.04
and I am not sure how to find the correspondingconfig.pbtxt
files in versionr24.07
. Is there any references for me to locate theseconfig.pbtxt
files so that I can compare with TensorRT-LLM version r24.07?Who can help?
@juney-nvidia @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Not a code bug issue
Expected behavior
Not a code bug issue
actual behavior
Not a code bug issue
additional notes
Not a code bug issue