Provide an interface similar to OpenAI API

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

8.11k stars 896 forks source link

Provide an interface similar to OpenAI API #334

Open Pevernow opened 10 months ago

Pevernow commented 10 months ago

Could you please provide a simple interface similar to OpenAI API?

juney-nvidia commented 10 months ago

@Pevernow Can you elaborate more about your request? Thanks June

Pevernow commented 10 months ago

The most basic thing is just a simple chat/completions function similar to OpenAI.

The purpose is to facilitate access to existing applications that use the chatgpt API.

Currently, many open-source LLM projects have implemented this feature, such as the famous oobabooga/text generation webui

merrymercy commented 10 months ago

Users want something like this https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md, so they can switch their apps from OpenAI models to TRT-LLM models easily without code change.

Pevernow commented 10 months ago

@juney-nvidia Please take a look here, thank you

gesanqiu commented 9 months ago

Right now Python API still have a lot of issues to be fixed, I encapsulated one OpenAI API, but met #283, so you still need use C++ runtime, which means you need Triton. Spent weeks on TRT-LLM, it difficult to develop on python runtime.

@juney-nvidia What's the position of TRT-LLM's Python runime? I mean, python is easier than C++, and batch manager doesn't open source right now. Most developors may not use Triton since they won't meet that large commercial demand.

chrjxj commented 9 months ago

mark

juney-nvidia commented 9 months ago

Sorry for replying late due to being trapped by other things.

Users want something like this https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md, so they can switch their apps from OpenAI models to TRT-LLM models easily without code change.

@Pevernow @merrymercy well received, will discuss with prod about this. @ncomly-nvidia for vis.

What's the position of TRT-LLM's Python runime? I mean, python is easier than C++, and batch manager doesn't open source right now. Most developors may not use Triton since they won't meet that large commercial demand.

@gesanqiu we have already released the Python binding of C++ runtime, including batch manager, does this fulfill your requirement here?