ChatOllama keepAlive is ignored if set in the default options

davidmigloz / langchain_dart

Build LLM-powered Dart/Flutter applications.

https://langchaindart.dev

MIT License

426 stars 75 forks source link

ChatOllama keepAlive is ignored if set in the default options #480

Closed WilliamKarolDiCioccio closed 4 months ago

WilliamKarolDiCioccio commented 4 months ago

System Info

Platform: Windows 11 GPU: RTX 4060 8GB langchain: ^0.7.3 langchain_ollama: ^0.2.2+1

Related Components

[ ] doc-loaders
[ ] doc-transformers
[ ] prompts
[ ] llms
[x] chat-models
[ ] output-parsers
[ ] chains
[ ] memory
[ ] stores
[ ] embeddings
[ ] retrievers
[ ] agents

Reproduction

As you can see here I'm simply setting the keepalive time to the model specific one if it is specified otherwise to the global one (yeah, its the same piece of code from last time). I double checked the values in debug mode and those are correct. However by looking at the GPU's memory usage I see that the model is being unloaded after the default five minutes (checked with stopwatch), this is confirmed by the response latency after this short period as it loads the model again.

Expected behavior

We don't need to add anything on the expected behavior _(oo)/.

davidmigloz commented 4 months ago

hello @WilliamKarolDiCioccio,

I've just tested it and seems to be working fine. You can check whether a model is still loaded using the ollama ps command.

If I set keepAlive: 0 and run ollama ps after invoking the model, I get an empty list
If I set keepAlive: 1 and run ollama ps after invoking the model, I get the model I run. After 1min, I get an empty list
If I set keepAlive: -1, the model is always listed when running ollama ps

Make sure you're using the latest version of Ollama, just in case.

Cheers.

WilliamKarolDiCioccio commented 4 months ago

I'll check it using the ps command, however considering the response latency I think it is being unloaded so the only remaining solution is to update Ollama. Tomorrow I'll check and let you know

The famous phrase: "It doesn't work, on my machine"

davidmigloz commented 4 months ago

The famous phrase: "It doesn't work, on my machine"

Always the case 😄

Let me know if after upgrading the issue is solved. In the latest version, they reworked their memory management substantially as now they support parallel requests and loading different models at the same time.

WilliamKarolDiCioccio commented 4 months ago

Ok @davidmigloz, I've downloaded the latest version of Ollama cleaned the flutter project reinstalling all dependencies (just in case) and even hardcoded the keepalive time as a test to completely exclude any logic error of my app. I'm pretty sure Ollama is straight up ignoring me :(

Here's the code:

_chat = ChatOllama(
  defaultOptions: ChatOllamaOptions(
      model: _modelName,
      keepAlive: -1,
  ),
);

And here's the Ollama ps command output:

NAME            ID              SIZE    PROCESSOR       UNTIL
llama3:latest   365c0bd3c000    5.4 GB  100% GPU        4 minutes from now

davidmigloz commented 4 months ago

Hmm could it be that the options are being overridden somewhere else? Let me know the branch from your repo where you're working, I can give it a try 🙂

WilliamKarolDiCioccio commented 4 months ago

Don't worry, I don't want to waste your time. I'll check it myself. If the problem is what you're saying it should be pretty easy to identify. If won't find it then you'll be welcome to have a look, in such case you can have a look directly to the main branch.

Edit: I'm also going to check for calls to the constructor itself to be sure

WilliamKarolDiCioccio commented 4 months ago

Screenshot 2024-07-09 215315

Nope. @davidmigloz

As you can see from the local variables in debugging mode the options are still showing the correct value, moreover I took this screen on toggling a breakpoint in the _buildChain() method so these are the final values used for generation.

davidmigloz commented 4 months ago

Hey @WilliamKarolDiCioccio ,

My bad, it is actually a bug in LangChain.dart. While reviewing #483 I found out that keepAlive is only considered if it's passed in the invoke options, while it's ignored if it's set in the default options:

https://github.com/davidmigloz/langchain_dart/blob/4bb79c30002aec3e673828ef2762e853491ea594/packages/langchain_ollama/lib/src/chat_models/chat_ollama.dart#L222

When tested it I was passing it in the invoke options, that's why I wasn't able to reproduce the issue.

I'll issue a fit later today.

Ganeshsivakumar commented 4 months ago

Hi @davidmigloz

Then is guess, it should take keepAlive value from whichever is provided, either from defaultOptions or invoke options.

davidmigloz commented 4 months ago

Yes, it should use the defaultOptions value if options is null.

So from:

keepAlive: options?.keepAlive

to:

keepAlive: options?.keepAlive ?? defaultOptions.keepAlive,

I've fixed it in my commit at https://github.com/davidmigloz/langchain_dart/pull/483