Open Ben-Epstein opened 1 week ago
When I install with nix
, it is able to run. I get a different error, which I dont think should happen, since I am able to run this model using gpt4all without memory issues
ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer, size = 8484.02 MiB
llama_new_context_with_model: failed to allocate compute buffers
ggml_metal_free: deallocating
llama_init_from_gpt_params: failed to create context with model 'Phi-3.5-mini-instruct-IQ2_M.gguf'
main: error: unable to load model
But that's a different subject
It looks like there's two different models used, without a source for either, so it'll be hard to reproduce.
If that's the only output from llama.cpp, I'm guessing the issue is you're running it on M3 but the llama.cpp is built for intel -- c.f. "x86_64" in "x86_64-apple-darwin23.4.0".
For mac M3, you must build for ARM target. Otherwise, x86_64 binary will under Rosetta, which cause illegal instruction whenever it sees any AVX instruction.
llama_new_context_with_model: failed to allocate compute buffers
You need to specify the context length, -c 4096
for example
What happened?
When I run
llama-cli -m ./Phi-3.5-mini-instruct-Q6_K_L.gguf -p "I believe the meaning of life is" -n 128
I get an errorI did a fresh install with homebrew, but got the same repeatedly. Seems the same as https://github.com/ggerganov/llama.cpp/issues/8065, but that issue was closed so I wanted to have an open one to track
Name and Version
What operating system are you seeing the problem on?
Mac
Relevant log output
No additional log output available, even with
-v