Closed safeswap closed 2 months ago
Could you try passing the flag -c 2048
to limit the context size?
Also the crash you reported appears to be happening inside the WIN32 API.
Could this be a problem llama_kv_cache_init: CPU KV buffer size = 49152.00 MiB (allocates around 50 GB) of ram for the kv cache alone) after i assured that I had at least 59 GB of memory free it worked.
-c 2048 helped too it then only used 768.00 MiB for kv cache
Glad context fixed it. Since the OOM crash is in WIN32 code I don't think this is actionable for us. Thanks for the report. It's a known issue re: making these 128k context models less surprising w.r.t. memory requirements. We're working on that too.
-c 2048 didn't work for me , hope you can fix this problem soon.
Contact Details
safeswapio@gmail.com
What happened?
The "llamafile" is not functioning properly, my system is windows11
Version
llamafile v0.8.13
What operating system are you seeing the problem on?
No response
Relevant log output