Open pgayvallet opened 2 weeks ago
Pinging @elastic/appex-ai-infra (Team:AI Infra)
I want to make sure I understand with an example.
When the caller has opted into streaming, would this allow the caller to change the shape of our chatComplete
return type from ChatCompleteStreamResponse
into something that is instead compatible with the OpenAI format?
If so, is there a benefit to maintaining two output formats (our own + the OpenAI format), or would we want to "just" standardize our output streams to always be in the OpenAI format?
It would just be exposing a utility to convert streams from our format to openAI's (e.g. inferenceStreamToOpenAI
). Changing our output format is out of the question here.
Tasks may want to have their output exposed as a stream, and in some case exposed as public APIs.
In those scenario (and probably other), we might want to expose our output in an openAI compatible format, as this is what most integrations are using.
For that reason, we should have a way to convert our inference stream events to an openAI compatible one, similar to what was done for o11y assistant in https://github.com/elastic/kibana/blob/d4d34da1cef118872f651a626d667ba98307fdac/x-pack/plugins/observability_solution/observability_ai_assistant/server/service/util/observable_into_openai_stream.ts#L30-L46
That kind of stateless function should probably be exposed from a package (but probably not from
inference-common
)