Closed Anonyfox closed 9 months ago
Unfortunately it's not possible. Part of the issue is, we don't know when or if the LLM will execute a function and if there is streaming at all, then the function call execution request is streamed as well.
The LLM streams back an assistant message saying it wants to execute a function. We can detect very early that it's a function call though, and you can ignore that for rendering until the callback returns the fully finished message.
What I want to achieve: function calling should be done as a single operation (not delta by delta) like the current implementation works. But content messages should be streamable and receive deltas.
So I can have the current convenience combined with the streaming elegance for the user at the same time! Is this possible somehow without lots of code or a manual loop?