Bug: Memory allocation on windows still fails on some models

DK013 commented 3 months ago

Contact Details

chakraborty.deep013@gmail.com

What happened?

This may be related to #501 I tried loading Phi-3.5-mini-instruct-Q8_0.gguf failed with memory allocation error.

Here's a log with --strace flag just in case: log dump

Relevant System specs: CPU: Ryzen 3600X GPU: Radeon RX 5700 XT RAM: 32GB DDR4 3000Mhz

Version

llamafile v0.8.13

What operating system are you seeing the problem on?

No response

Relevant log output

llm_load_tensors: ggml ctx size =    0.12 MiB
llm_load_tensors:        CPU buffer size =  3872.38 MiB
.....................................................................................
llama_new_context_with_model: n_ctx      = 131072
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 51539607584
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
llama_init_from_gpt_params: error: failed to create context with model '/I/LMStudio/lmstudio-community/Phi-3.5-mini-instruct-GGUF/Phi-3.5-mini-instruct-Q8_0.gguf'
{"function":"load_model","level":"ERR","line":452,"model":"/I/LMStudio/lmstudio-community/Phi-3.5-mini-instruct-GGUF/Phi-3.5-mini-instruct-Q8_0.gguf","msg":"unable to load model","tid":"11681088","timestamp":1724502100}

jart commented 2 months ago

Try setting a smaller context size than the default, e.g. -c 4096. A lot of models have a 128k context size now, which needs more memory than most consumers have in their PCs. Let me know if that doesn't fix it and I'll reopen.

DK013 commented 2 months ago

@jart hey that works. thanks for the clarification.

Mozilla-Ocho / llamafile