Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
16.75k stars 830 forks source link

How to set context size? Running dolphin mixtral q4km, using too much of my 64gb of ram. want to lower it. #425

Closed FemBoxbrawl closed 1 month ago

FemBoxbrawl commented 1 month ago

I am running Dolphin Mixtral Q4kM on windows, and I have 64 gb of ram. How can I set the Context length to reduce the amount of ram that is being used? I only need like maximum 2048 ctx length. Its eating up 57gb from my 64. How can I make it only use up 30GB?

thanks, would help if i got an answer

vlasky commented 1 month ago

Use the following commandline option:

--ctx-size 2048

FemBoxbrawl commented 1 month ago

thanks

jart commented 1 month ago

Thanks for helping @vlasky! You can also say -c 0 as an easy way to set the max context size allowed by the model.