ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.93k stars 9.61k forks source link

Cannot offload To GPU M1 #6426

Closed NotMash closed 6 months ago

NotMash commented 6 months ago

ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 ggml_metal_init: picking default device: Apple M1 ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: GGML_METAL_PATH_RESOURCES = /Users/ibrahim/PycharmProjects/IbrahimAIChat/llama.cpp/ ggml_metal_init: loading '/Users/ibrahim/PycharmProjects/IbrahimAIChat/llama.cpp/ggml-metal.metal' ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:3:10: fatal error: 'ggml-common.h' file not found

include "ggml-common.h"

     ^~~~~~~~~~~~~~~

" UserInfo={NSLocalizedDescription=program_source:3:10: fatal error: 'ggml-common.h' file not found

include "ggml-common.h"

     ^~~~~~~~~~~~~~~

} llama_new_context_with_model: failed to initialize Metal backend Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/server/main.py", line 88, in main() File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/server/main.py", line 74, in main app = create_app( ^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/server/app.py", line 138, in create_app set_llama_proxy(model_settings=model_settings) File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/server/app.py", line 75, in set_llama_proxy _llama_proxy = LlamaProxy(models=model_settings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/server/model.py", line 31, in init self._current_model = self.load_llama_from_model_settings( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/server/model.py", line 138, in load_llama_from_model_settings _model = create_fn( ^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/llama.py", line 328, in init self._ctx = _LlamaContext( ^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_cpp/_internals.py", line 265, in init raise ValueError("Failed to create llama_context") ValueError: Failed to create llama_context warning: failed to munlock buffer: Cannot allocate memory

after:

python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1

Screenshot 2024-04-01 at 19 28 40

I do have ggml-common.h

so can anyone help me?

I made an environment variable try point to ggml-common.h

ggml_metal_init: GGML_METAL_PATH_RESOURCES = /Users/ibrahim/PycharmProjects/IbrahimAIChat/llama.cpp/

still get the same error

NotMash commented 6 months ago

LOOL dw I fixed it