This PR is heavily inspired by #61 and uses some code of it. It however still uses streaming to have that low latency start. All credits for figuring out the Diff parts go to @dipamsen I would've never figured that one out. The only thing I really did was incorperate it in the streaming system so we could keep using streams for the prompts instead of having the latency of waiting for the entire response.
This PR is heavily inspired by #61 and uses some code of it. It however still uses streaming to have that low latency start. All credits for figuring out the Diff parts go to @dipamsen I would've never figured that one out. The only thing I really did was incorperate it in the streaming system so we could keep using streams for the prompts instead of having the latency of waiting for the entire response.