At the moment, the chatComplete and output APIs are always returning on observable to allow supporting "llm response streaming".
If that's definitely useful in some scenarios (especially assistant-related calls), in most "task execution" scenario, we only really need the final and full response from the LLM, and having that observable-based API can be bothersome, as every call needs to be wrapped in the appropriate observable chaining to retrieve the data of the last event.
We should have a way to call those APIs in "non stream" mode, to have them return a promise of the complete response instead of the observable. One possible option for that would be to add a stream parameter, what would switch the shape of the response.
At the moment, the
chatComplete
andoutput
APIs are always returning on observable to allow supporting "llm response streaming".If that's definitely useful in some scenarios (especially assistant-related calls), in most "task execution" scenario, we only really need the final and full response from the LLM, and having that observable-based API can be bothersome, as every call needs to be wrapped in the appropriate observable chaining to retrieve the data of the last event.
We should have a way to call those APIs in "non stream" mode, to have them return a promise of the complete response instead of the observable. One possible option for that would be to add a
stream
parameter, what would switch the shape of the response.