Closed Alireza3242 closed 1 month ago
Would be great if triton had official support for openai compatible api.
TensorRT-LLM has an OpenAI compatible API, plz refer to this example: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/apps
Interesting! I hadn't seen that yet. But this one doesn't go through triton, to use inflight batching and everything else, right?
Yes, it doesn't go through triton, but it can use the inflight batching through the python binding of the executor.
Interesting! Are there other advantages to using triton, then?
Performance and C++ compatible API.
If you want to serve a model and get openai api and also swagger, With VLLM you have to run only 2 line of code:
But with tensorrt it is very hard: 1- We have to download model. 2- Download tensorrt-triton docker file 3- Run docker file with mount the model path 4- convert mode 5- build model 6- copy config files from tensorrt-llm-backend for triton 7- start triton
Still we have not openai api or swagger! We only can use triton api which does not support many things such chat template.