Motivation

At present, there is an env var AZURE_OPENAI_STREAM which controls streaming when using the byod endpoint, but it is not used at all when using the custom endpoint.

Investigate the feasibility of implementing streamed responses on the custom endpoint. This is probably non-trivial because:

We would not want to stream the initial LLM response if it is a function call, but instead stream the response of the function call
There are a number of post-response steps that occur after the response is generate, such as calling content safety, and storing the result in the index
- We probably should not stream the response before calling content safety, explore using built-in Azure OpenAI and Content Safety integration: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=warning%2Cpython-new

How would you feel if this feature request was implemented?

Requirements

A list of requirements to consider this feature delivered

Stream response back to the client from the /custom endpoint
Ensure content safety is included in the response

Tasks

To be filled in by the engineer picking up the issue

Azure-Samples / chat-with-your-data-solution-accelerator

Investigate adding streaming to `/api/conversation/custom` endpoint #702

Motivation

How would you feel if this feature request was implemented?

Requirements

Tasks