Closed WilliamKarolDiCioccio closed 4 months ago
hello @WilliamKarolDiCioccio,
I've just tested it and seems to be working fine. You can check whether a model is still loaded using the ollama ps
command.
keepAlive: 0
and run ollama ps
after invoking the model, I get an empty listkeepAlive: 1
and run ollama ps
after invoking the model, I get the model I run. After 1min, I get an empty listkeepAlive: -1
, the model is always listed when running ollama ps
Make sure you're using the latest version of Ollama, just in case.
Cheers.
I'll check it using the ps command, however considering the response latency I think it is being unloaded so the only remaining solution is to update Ollama. Tomorrow I'll check and let you know
The famous phrase: "It doesn't work, on my machine"
The famous phrase: "It doesn't work, on my machine"
Always the case 😄
Let me know if after upgrading the issue is solved. In the latest version, they reworked their memory management substantially as now they support parallel requests and loading different models at the same time.
Ok @davidmigloz, I've downloaded the latest version of Ollama cleaned the flutter project reinstalling all dependencies (just in case) and even hardcoded the keepalive time as a test to completely exclude any logic error of my app. I'm pretty sure Ollama is straight up ignoring me :(
Here's the code:
_chat = ChatOllama(
defaultOptions: ChatOllamaOptions(
model: _modelName,
keepAlive: -1,
),
);
And here's the Ollama ps command output:
NAME ID SIZE PROCESSOR UNTIL
llama3:latest 365c0bd3c000 5.4 GB 100% GPU 4 minutes from now
Hmm could it be that the options are being overridden somewhere else? Let me know the branch from your repo where you're working, I can give it a try 🙂
Don't worry, I don't want to waste your time. I'll check it myself. If the problem is what you're saying it should be pretty easy to identify. If won't find it then you'll be welcome to have a look, in such case you can have a look directly to the main branch.
Edit: I'm also going to check for calls to the constructor itself to be sure
Nope. @davidmigloz
As you can see from the local variables in debugging mode the options are still showing the correct value, moreover I took this screen on toggling a breakpoint in the _buildChain()
method so these are the final values used for generation.
Hey @WilliamKarolDiCioccio ,
My bad, it is actually a bug in LangChain.dart. While reviewing #483 I found out that keepAlive
is only considered if it's passed in the invoke options, while it's ignored if it's set in the default options:
When tested it I was passing it in the invoke options, that's why I wasn't able to reproduce the issue.
I'll issue a fit later today.
Hi @davidmigloz
Then is guess, it should take keepAlive value from whichever is provided, either from defaultOptions or invoke options.
Yes, it should use the defaultOptions
value if options
is null
.
So from:
keepAlive: options?.keepAlive
to:
keepAlive: options?.keepAlive ?? defaultOptions.keepAlive,
I've fixed it in my commit at https://github.com/davidmigloz/langchain_dart/pull/483
System Info
Platform: Windows 11 GPU: RTX 4060 8GB langchain: ^0.7.3 langchain_ollama: ^0.2.2+1
Related Components
Reproduction
As you can see here I'm simply setting the keepalive time to the model specific one if it is specified otherwise to the global one (yeah, its the same piece of code from last time). I double checked the values in debug mode and those are correct. However by looking at the GPU's memory usage I see that the model is being unloaded after the default five minutes (checked with stopwatch), this is confirmed by the response latency after this short period as it loads the model again.
Expected behavior
We don't need to add anything on the expected behavior _(oo)/.