JHubi1 / ollama-app

A modern and easy-to-use client for Ollama
Apache License 2.0
241 stars 17 forks source link

Not clearing VRAM after call #16

Closed Tycho-S closed 2 weeks ago

Tycho-S commented 2 weeks ago

Thanks for making this great app!

One thing I noticed, it seems it is requesting the ollama server to immediately clear the model from VRAM again after each request. This means there is a lengthy waiting period for every followup question while the model is loaded back into memory. Could that be configurable? I would set the keep_alive in the api request to -1 so it only gets removed from VRAM when it needs to make space for a different model.

PS I noticed this especially when testing the llava visual model. Which does work great by the way!

JHubi1 commented 2 weeks ago

I'll look into it, it shouldn't be too complicated

JHubi1 commented 2 weeks ago

Could you try this build: ollama.zip (unpack the zip and install the APK) It's experimental, expect to run into issues. Does that satisfy your needs?

Edit: You can find the option under Settings > Interface. Scroll down and click on "Set duration how long to keep loaded" (text not final, sounds quirky, I know). The two toggles above represent -1 and 0.

JHubi1 commented 2 weeks ago

Solved it in v1.1.0. Feel free to open a new issue if you miss any other features.

Tycho-S commented 2 weeks ago

Thanks for the build! I was at a birthday yesterday night so I couldn't try it. I'll try today and will let you know.


From: JHubi1 @.> Sent: Saturday, June 8, 2024 9:31:22 PM To: JHubi1/ollama-app @.> Cc: Tycho Schenkeveld @.>; Author @.> Subject: Re: [JHubi1/ollama-app] Not clearing VRAM after call (Issue #16)

Could you try this build: ollama.ziphttps://github.com/user-attachments/files/15749424/ollama.zip (unpack the zip and install the apk) It's experimental, expect to run into issues. Does that satisfy your needs?

— Reply to this email directly, view it on GitHubhttps://github.com/JHubi1/ollama-app/issues/16#issuecomment-2156150573, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACUMU6HLC3H6HH4VOWCIEL3ZGNLYVAVCNFSM6AAAAABI77IVJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGE2TANJXGM. You are receiving this because you authored the thread.Message ID: @.***>

Tycho-S commented 2 weeks ago

The ZIP build worked fine, thank you!! When I selected "keep always active" it now keeps it in memory making repeat enquiries much faster. I really appreciate it.

PS: I assume this overrides the timing setting below?

JHubi1 commented 2 weeks ago

Yes. All three options basically control the same value. The first toggle sets it to -1 (keep always loaded), the second 0 (don't keep loaded) and the duration selector basically sets the time. Feel free to open a new issue in case you have other feature requests