Open remixer-dec opened 1 year ago
Same bug appears when trying to run llama_cpp.server, since there is no import torch
in server code.
HOST=localhost python3.10 -m llama_cpp.server --model ./PATH/TO/MODEL.gguf --n_gpu_layers 1 --n_ctx 2048
But the error looks differently:
llama_new_context_with_model: max tensor size = 205.08 MB
ggml_metal_add_buffer: failed to allocate 'data ' buffer, size = 0.00 MB
llama_new_context_with_model: failed to add buffer
ggml_metal_free: deallocating
also it says ggml_metal_init: hasUnifiedMemory = false
and when pytorch is imported: ggml_metal_init: hasUnifiedMemory = true
This was happening to me and adding import torch
fixed it for me. Some extra info I was looking at before adding the import: 1) initially everything was working, then after running a notebook several times I started getting the failed to allocate
error. 2) over a few runs it seems like the buffer size is adding up instead of overwriting it even though its the same model. 3) could there be a missing clear step that importing torch does that effectively clears this buffer?
First run:
ggml_metal_add_buffer: allocated 'data ' buffer, size = 7339.34 MB, ( 7339.84 / 49152.00)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 489.50 MB, ( 7829.34 / 49152.00)
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 275.38 MB, ( 8104.72 / 49152.00)
Second run:
ggml_metal_add_buffer: allocated 'data ' buffer, size = 7339.34 MB, (15445.94 / 49152.00)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 489.50 MB, (15935.44 / 49152.00)
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 275.38 MB, (16210.81 / 49152.00)
ggml_metal_free: deallocating
Third run:
ggml_metal_add_buffer: error: failed to allocate 'data ' buffer, size = 0.00 MB
llama_new_context_with_model: failed to add buffer
ggml_metal_free: deallocating
Possibly related: https://github.com/ggerganov/llama.cpp/discussions/3580
Got a bit more info about this/similar issue:
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/opt/homebrew/lib/python3.10/site-packages/llama_cpp/ggml-metal.metal'
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "Compiler encountered XPC_ERROR_CONNECTION_INVALID (is the OS shutting down?)" UserInfo={NSLocalizedDescription=Compiler encountered XPC_ERROR_CONNECTION_INVALID (is the OS shutting down?)}
llama_new_context_with_model: failed to initialize Metal backend
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
The model is loaded with Metal (MPS) initialization
Current Behavior
Metal initialization is failing with an error that it is not supported
Environment and Context
Darwin 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000 arm64 Python 3.10.12 GNU Make 4.3 Apple clang version 13.0.0 (clang-1300.0.29.30) llama-cpp-python==0.1.67
Failure Information (for bugs)
When I run the minimal code with n_gpu_layers>0 without importing pytorch on M1, python crashes with 'MPS not supported' error
Steps to Reproduce
Run this code:
Now uncomment
import torch
and the bug is gone!I was making a simple API server and I spent a few hours trying to understand why llama_cpp does not work in a simple script
Failure Logs