Imagineer99 commented 4 months ago

New Feature: chat/completions style endpoint with structured generation support.

Background

When serving outlines with vLLM to interact with an HTTP library, currently only the /generate endpoint is available. However, there's a need for a chat/completions equivalent that supports structured generation and streaming.

Proposed Solution

Implement OpenAI compatible endpoint functionality with special handling for the metadata object, specifically using a key called structure. This approach would allow:

Structuring inputs like a conversation with alternating user messages and assistant responses.
Having the next response use structured generation.
Streaming the output, so users don't receive the full completion at once and have to construct the chat history manually.

Implementation Details

Utilize the OpenAI API's metadata object functionality.
Add special handling for a structure key within the metadata object.
Implement streaming support for the structured output.

Benefits

Improved compatibility with chat-based applications.
Enhanced user experience through streaming responses.
Easier integration for developers familiar with OpenAI's chat/completions API.

Resources

OpenAI metadata usage: https://community.openai.com/t/how-does-the-assistant-api-use-the-metadata-field/481096
OpenAI API reference for metadata: https://platform.openai.com/docs/api-reference/batch/create#batch-create-metadata

Next Steps

Discuss the feasibility and design of this feature.
Outline specific implementation steps.
Assign developers to work on the feature (Lee has offered to contribute if time allows).

Related Discussions

https://discord.com/channels/1182316225284554793/1182592312669372427/1260988449238814802

Please feel free to provide any feedback or suggestions to improve this proposal.

lapp0 commented 2 months ago

Is this resolved by https://github.com/vllm-project/vllm/pull/7654

Imagineer99 commented 2 months ago

Is this resolved by vllm-project/vllm#7654

Looks like it does!

dottxt-ai / outlines

chat/completions endpoint with structured generation support #1041