Closed brainlid closed 7 months ago
The model needs to be compiled with the flag because stream may compile a slightly different model. In any case, you could immediately consume the stream if you don’t want to stream it? You could handle it yourself after you call the serving, no?
Thanks @josevalim. Didn't realize the flag may cause the model to compile differently. That makes sense though. And yes, I can consume the full stream when that's the desired behavior. Thanks!
Currently, the
stream: true
orstream: false
option is set when the serving is created.Example:
If possible, it would be nice to have this option be set on a per-call basis. Some calls will be displayed to the user and streaming is preferred for that.
Other calls may be executed behind the scenes with no UI. An example of this is data extraction or analyzing some text to summarize it or classify text as belonging to 1 of several categories. For these cases, we don't want streaming.
Streaming is sending data between processes and may even be across nodes. We can cut reduce unnecessary chatter by not streaming and instead waiting for the final finished result.