elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.6k stars 8.21k forks source link

[inference] Add non-stream versions of the `chatComplete` and `output` APIs. #198644

Open pgayvallet opened 2 hours ago

pgayvallet commented 2 hours ago

At the moment, the chatComplete and output APIs are always returning on observable to allow supporting "llm response streaming".

If that's definitely useful in some scenarios (especially assistant-related calls), in most "task execution" scenario, we only really need the final and full response from the LLM, and having that observable-based API can be bothersome, as every call needs to be wrapped in the appropriate observable chaining to retrieve the data of the last event.

We should have a way to call those APIs in "non stream" mode, to have them return a promise of the complete response instead of the observable. One possible option for that would be to add a stream parameter, what would switch the shape of the response.

elasticmachine commented 2 hours ago

Pinging @elastic/appex-ai-infra (Team:AI Infra)