ParisNeo / ollama_proxy_server

A proxy server for multiple ollama instances with Key security
Apache License 2.0
249 stars 36 forks source link

[Bug] Streaming doesn't work #6

Open short-circuit opened 6 months ago

short-circuit commented 6 months ago

Using this proxy with continuedev does not stream the responses. The server answers correctly but the response is only output once the generation is totally complete

ParisNeo commented 6 months ago

there should be a problem in your setup. I use it with lollms as client and it does work in streaming mode. I use /api/generate endpoint.

Maybe you are using /api/chat endpoint. I did not test that one.

balamuthu1 commented 6 months ago

yes, i confirm. With /api/chat endpoint the stream doesn't seems to work and without the proxy the streaming works fine. Could you please help us on this?

balamuthu1 commented 6 months ago

I made some adjustment to send_response like below and stream response works:

def _send_response(self, response): self.send_response(response.status_code) self.send_header('Content-type', response.headers['content-type']) self.send_header('Stream', True) self.end_headers() for line in response.iter_lines(): if line: chunk = line + b'\r\n' self.wfile.write(chunk) self.wfile.flush()

ParisNeo commented 6 months ago

hi, if you want you can do a pull request.

ParisNeo commented 5 months ago

I made some adjustment to send_response like below and stream response works:

def _send_response(self, response): self.send_response(response.status_code) self.send_header('Content-type', response.headers['content-type']) self.send_header('Stream', True) self.end_headers() for line in response.iter_lines(): if line: chunk = line + b'\r\n' self.wfile.write(chunk) self.wfile.flush()

HI, would you like to contribute with your code? You can just fork the repo, apply your fix, then do a pull request and I'll accept it. That would add you as a contributor.

balamuthu1 commented 5 months ago

Oh yes sorry i was completely taken by other tasks these days. Yes i'll try to do it properly today :)

balamuthu1 commented 5 months ago

done

petritavd commented 5 months ago

I created a PR for this as well, as I made these fixes some days ago. Thank you for your work. Check it here: https://github.com/ParisNeo/ollama_proxy_server/pull/9

ParisNeo commented 5 months ago

Hi, sorry I had to accept only one of the two. So I accepted the one from balamuthu1 They are basically the same thing, just minor differences.

Thanks for both of you. if you have some enhancements or suggestions, you are welcome to contribute.

I had no time adding https integration but this can be a cool thing to add.

petritavd commented 5 months ago

There's issues with the merged PR, as now it will make stream requests to Ollama every time(which will affect the speed of the response), and will respond with stream. I've updated my PR and fixed the conflicts. It will use stream or not based on stream parameter on the request. Please check it again.

ParisNeo commented 5 months ago

Ok then, thank you very much. I'll take a look then merge it

ParisNeo commented 5 months ago

Ok, thanks, I have accepted your PR. I hope this now works for everyone.

Don't hesitate to confirm if now it works for you or if there are still issues that we may want to fix. Thank you all for your help.