Set keep_alive parameter

Bram-diederik commented 8 months ago

Checklist

[X] I have filled out the template to the best of my ability.
[X] This only contains 1 feature request (if you have multiple feature requests, open one feature request for each feature request).
[X] This issue is not a duplicate feature request of previous feature requests.

Is your feature request related to a problem? Please describe.

The 1st time i run a promp it takes long. 30 ot 40 seconds. Every next is run in 10 seconds.

I tryed some curl commands provided in the ollama faq to preload the model but with no luck.

Perhaps it has to do with the prompt or the session or something :/

But could you add the keep alive paramerer as option. I have a cpu only system but plenty of ram.

Describe the solution you'd like

A option to set the keepalive as described in the ollama faq

Describe alternatives you've considered

Curl commands described in the faq

Additional context

I think it is compleat

dansharpy commented 7 months ago

Would also like this. I've got Ollama using a Tesla M60 but access it from different endpoints (ha automations, librechat gui, ollama cli) using different models and would be handy to be able to unload models faster!

Bram-diederik commented 7 months ago

@dansharpy for you the latest ollama version helps out. You can set the OLLAMA_KEEP_ALIVE environment varible.

For me i need a single model to be set at -1 so it is not perfect for me

dansharpy commented 7 months ago

Thanks for this, I was looking for an environment variable I could set but couldn't find it. Is there a list in the docs somewhere I've missed? I set to 0 in Ollama which seems to work fine in the librechat gui (i.e. unloads model as soon as its completed a request), but when a request is sent from this integration in HA it keeps it loaded. I can only assume this integration sends a keep alive parameter in its request which is overriding the environment variable. Edit: Just been looking at the logs and seems the librechat gui sends api calls to the /v1/chat endpoint and this integration sends them to /api/chat. Wonder if that has something to do with it not respecting the environment variable?

ej52 / hass-ollama-conversation