NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.11k stars 896 forks source link

Provide an interface similar to OpenAI API #334

Open Pevernow opened 10 months ago

Pevernow commented 10 months ago

Could you please provide a simple interface similar to OpenAI API?

juney-nvidia commented 10 months ago

@Pevernow Can you elaborate more about your request? Thanks June

Pevernow commented 10 months ago

image The most basic thing is just a simple chat/completions function similar to OpenAI.

The purpose is to facilitate access to existing applications that use the chatgpt API.

Currently, many open-source LLM projects have implemented this feature, such as the famous oobabooga/text generation webui

merrymercy commented 10 months ago

Users want something like this https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md, so they can switch their apps from OpenAI models to TRT-LLM models easily without code change.

Pevernow commented 10 months ago

@juney-nvidia Please take a look here, thank you

gesanqiu commented 9 months ago

Right now Python API still have a lot of issues to be fixed, I encapsulated one OpenAI API, but met #283, so you still need use C++ runtime, which means you need Triton. Spent weeks on TRT-LLM, it difficult to develop on python runtime.

@juney-nvidia What's the position of TRT-LLM's Python runime? I mean, python is easier than C++, and batch manager doesn't open source right now. Most developors may not use Triton since they won't meet that large commercial demand.

chrjxj commented 9 months ago

mark

juney-nvidia commented 9 months ago

Sorry for replying late due to being trapped by other things.

Users want something like this https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md, so they can switch their apps from OpenAI models to TRT-LLM models easily without code change.

@Pevernow @merrymercy well received, will discuss with prod about this. @ncomly-nvidia for vis.

What's the position of TRT-LLM's Python runime? I mean, python is easier than C++, and batch manager doesn't open source right now. Most developors may not use Triton since they won't meet that large commercial demand.

@gesanqiu we have already released the Python binding of C++ runtime, including batch manager, does this fulfill your requirement here?

binarycrayon commented 6 months ago

vote up, monitoring

binarycrayon commented 6 months ago

More info about openai chat completion API spec here https://github.com/openai/openai-openapi/tree/master

whk6688 commented 6 months ago

i need it too

LMarino1 commented 3 months ago

+1 for OpenAI API support

Mary-Sam commented 3 months ago

+1 for OpenAI API support

mynameiskeen commented 2 months ago

+1 for OpenAI API support, it's been 9 months since the pr requested :(

palindsay commented 1 month ago

+1 OpenApi support

nstl-zyb commented 1 month ago

+1

wertyac commented 1 month ago

+1 for openai api support。

Alireza3242 commented 4 days ago

+1 for openai api support

mohankrishna225 commented 2 days ago

+1 for openai api support