New Feature: chat/completions style endpoint with structured generation support.
Background
When serving outlines with vLLM to interact with an HTTP library, currently only the /generate endpoint is available. However, there's a need for a chat/completions equivalent that supports structured generation and streaming.
Proposed Solution
Implement OpenAI compatible endpoint functionality with special handling for the metadata object, specifically using a key called structure. This approach would allow:
Structuring inputs like a conversation with alternating user messages and assistant responses.
Having the next response use structured generation.
Streaming the output, so users don't receive the full completion at once and have to construct the chat history manually.
Implementation Details
Utilize the OpenAI API's metadata object functionality.
Add special handling for a structure key within the metadata object.
Implement streaming support for the structured output.
Benefits
Improved compatibility with chat-based applications.
Enhanced user experience through streaming responses.
Easier integration for developers familiar with OpenAI's chat/completions API.
New Feature: chat/completions style endpoint with structured generation support.
Background
When serving outlines with vLLM to interact with an HTTP library, currently only the
/generate
endpoint is available. However, there's a need for a chat/completions equivalent that supports structured generation and streaming.Proposed Solution
Implement OpenAI compatible endpoint functionality with special handling for the
metadata
object, specifically using a key calledstructure
. This approach would allow:Implementation Details
metadata
object functionality.structure
key within themetadata
object.Benefits
Resources
Next Steps
Related Discussions
https://discord.com/channels/1182316225284554793/1182592312669372427/1260988449238814802
Please feel free to provide any feedback or suggestions to improve this proposal.