ParisNeo / ollama_proxy_server

A proxy server for multiple ollama instances with Key security
Apache License 2.0
249 stars 36 forks source link

qsize and not stream #4

Open bien-phillip opened 7 months ago

bien-phillip commented 7 months ago

Hi ParisNeo.

First of all, thank you for share awesome project. I just check it for parallel requests with Ollama. I found this repo from Ollama issue.

I've run two Ollama instances and run ollama_proxy_server. And I found two problem. One is that's not working with stream response. It works just send last data. Other is qsize couldn't get right. I attached screenshot.

image

It's just first try. I'll test it more and give feedback more.

Thanks.

ParisNeo commented 7 months ago

HI. I have a server with two ollama instances and I use streaming mode from my clients. My client uses the /generate endpoint and it works. Here I see you are using chat endpoint instead. The only endpoint that uses the double queues is the /generate I have to take a look at that endpoint I think that's why it is not working. I'll check out this and add the /chat endpoint to the queued management.

Thanks for informing me

ParisNeo commented 7 months ago

OK, just went through threir doc and now I added the chat endpoint to the list of queued endpoints

bien-phillip commented 7 months ago

Thank you for response. I will check /chat endpoint again. And I think streaming issue is not cause this source. I'm not sure though it should be langchain issue. I'll test it also.

canytam-krystal commented 7 months ago

Even using /generate, the output break into many responses but all responses show together. Streaming is not work.

ParisNeo commented 6 months ago

Even using /generate, the output break into many responses but all responses show together. Streaming is not work. What client are you using? I just tested using lollms.