Closed Tycho-S closed 2 weeks ago
I'll look into it, it shouldn't be too complicated
Could you try this build: ollama.zip (unpack the zip and install the APK) It's experimental, expect to run into issues. Does that satisfy your needs?
Edit: You can find the option under Settings
> Interface
. Scroll down and click on "Set duration how long to keep loaded" (text not final, sounds quirky, I know). The two toggles above represent -1
and 0
.
Solved it in v1.1.0. Feel free to open a new issue if you miss any other features.
Thanks for the build! I was at a birthday yesterday night so I couldn't try it. I'll try today and will let you know.
From: JHubi1 @.> Sent: Saturday, June 8, 2024 9:31:22 PM To: JHubi1/ollama-app @.> Cc: Tycho Schenkeveld @.>; Author @.> Subject: Re: [JHubi1/ollama-app] Not clearing VRAM after call (Issue #16)
Could you try this build: ollama.ziphttps://github.com/user-attachments/files/15749424/ollama.zip (unpack the zip and install the apk) It's experimental, expect to run into issues. Does that satisfy your needs?
— Reply to this email directly, view it on GitHubhttps://github.com/JHubi1/ollama-app/issues/16#issuecomment-2156150573, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACUMU6HLC3H6HH4VOWCIEL3ZGNLYVAVCNFSM6AAAAABI77IVJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGE2TANJXGM. You are receiving this because you authored the thread.Message ID: @.***>
The ZIP build worked fine, thank you!! When I selected "keep always active" it now keeps it in memory making repeat enquiries much faster. I really appreciate it.
PS: I assume this overrides the timing setting below?
Yes. All three options basically control the same value. The first toggle sets it to -1
(keep always loaded), the second 0
(don't keep loaded) and the duration selector basically sets the time.
Feel free to open a new issue in case you have other feature requests
Thanks for making this great app!
One thing I noticed, it seems it is requesting the ollama server to immediately clear the model from VRAM again after each request. This means there is a lengthy waiting period for every followup question while the model is loaded back into memory. Could that be configurable? I would set the keep_alive in the api request to -1 so it only gets removed from VRAM when it needs to make space for a different model.
PS I noticed this especially when testing the llava visual model. Which does work great by the way!