Previous implementation uses gin.Stream to copy message to proxy client. This can introduce message sending cached in server side and latency happens. This causes proxy client receives a bunch of messages one time, makes proxy clients have worse experience than openai official chabot. Proxy clients become smoother after this commit.
Previous implementation uses gin.Stream to copy message to proxy client. This can introduce message sending cached in server side and latency happens. This causes proxy client receives a bunch of messages one time, makes proxy clients have worse experience than openai official chabot. Proxy clients become smoother after this commit.