dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.53k stars 485 forks source link

chat/completions endpoint with structured generation support #1041

Open Imagineer99 opened 4 months ago

Imagineer99 commented 4 months ago

New Feature: chat/completions style endpoint with structured generation support.

Background

When serving outlines with vLLM to interact with an HTTP library, currently only the /generate endpoint is available. However, there's a need for a chat/completions equivalent that supports structured generation and streaming.

Proposed Solution

Implement OpenAI compatible endpoint functionality with special handling for the metadata object, specifically using a key called structure. This approach would allow:

  1. Structuring inputs like a conversation with alternating user messages and assistant responses.
  2. Having the next response use structured generation.
  3. Streaming the output, so users don't receive the full completion at once and have to construct the chat history manually.

Implementation Details

Benefits

Resources

Next Steps

Related Discussions

https://discord.com/channels/1182316225284554793/1182592312669372427/1260988449238814802


Please feel free to provide any feedback or suggestions to improve this proposal.

lapp0 commented 2 months ago

Is this resolved by https://github.com/vllm-project/vllm/pull/7654

Imagineer99 commented 2 months ago

Is this resolved by vllm-project/vllm#7654

Looks like it does!