go-skynet / go-llama.cpp

LLama.cpp golang bindings
MIT License
659 stars 80 forks source link

Apple Silicon Metal Support not working #91

Open soleblaze opened 1 year ago

soleblaze commented 1 year ago

When I try modifying the example to add llama.SetGPULayers(1) it doesn't appear to set that. The example is still using CPU without offloading to metal.

When I use it in local-ai it thinks that my q4_0 model is a f32 model. The same model works fine running llama.cpp directly.

Asserting on type 0
GGML_ASSERT: /Users/soleblaze/git/thirdparty/localai/go-llama/llama.cpp/ggml-metal.m:549: false && "not implemented"

You do need to copy the ggml-metal.metal file from the llama.cpp directory to your CWD for this to work. Otherwise it errors out with a can't find file (null).

Is there a different load path that go-llama.cpp should be using when loading a model in for metal?

mudler commented 1 year ago

When trying with the bindings are you following the steps in the readme?

BUILD_TYPE=metal make libbinding.a
CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14
soleblaze commented 1 year ago

Yes. With that example it still appears to use cpu. llama expects -ngl 1 to be passed to it in order to use metal.

using the steps in the readme also outputs this error: make: Circular llama.cpp/ggml-metal.o <- llama.cpp/ggml-metal.o dependency dropped.

When it uses metal it should output lines prefixed with ggml_metal_init: when it uses metal. It doesn't do this. If I clone the repo, cd into llama.cpp, and run LLAMA_METAL=1 make prior to building the libbinding.a then it works correctly.

Adding llama.SetGPULayers(1) to the llama.New call on line 33 allows the example to use metal. This fails to run due to a missing ggml-metal.metal file.

example output metal failure ``` ❯ CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "$HOME/models/ggml-model-q4_0.bin" -t 6 llama.cpp: loading model from /Users/soleblaze/models/ggml-model-q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 128 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 0.07 MB llama_model_load_internal: mem required = 5407.71 MB (+ 1026.00 MB per state) . llama_init_from_file: kv self size = 64.00 MB ggml_metal_init: allocating ggml_metal_init: using MPS ggml_metal_init: loading '(null)' ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=258 "The file name is invalid." exit status 1 ```

if I compile the examples/main.go and copy llama.cpp/ggml-metal.metal into my CWD then it works. running the example via go run fails to find the file.

tmc commented 1 year ago

FWIW I'm also not seeing GPU offload with -ngl 1 and from following the current README steps to build+use metal support.

nogpu

soleblaze commented 1 year ago

That's odd. it's working fine for me. When it loads the model do you get any of the ggml_metal_init lines? I do get sporadic gpu usage where it freezes every X amount of tokens and the gpu utilization dives.

Screenshot 2023-06-12 at 3 58 54 PM