BodhiSearch / BodhiApp

Run Open Source/Open Weight LLMs locally with OpenAI compatible APIs
63 stars 2 forks source link

ggml_gallocr_reserve_n: failed to allocate Metal buffer of size 8891928576 #4

Open radityagumay opened 2 months ago

radityagumay commented 2 months ago

I recently try the Bodhi CLI to download the llama 3.1 using this script

bodhi create llama3_1:instruct_q4 \
  --repo bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF \
  --filename Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --tokenizer-config meta-llama/Meta-Llama-3.1-8B-Instruct

it's success, however after executed bodhi run llama3_1:instruct_q4, i got below error

ggml_gallocr_reserve_n: failed to allocate Metal buffer of size 8891928576
⠹ Loading...                                                                                                                                                                                                                              llama_init_from_gpt_params: error: failed to create context with model '/Users/username/.cache/huggingface/hub/models--bullerwins--Meta-Llama-3.1-8B-Instruct-GGUF/snapshots/a4ac94cf28701b385c9028d49d314a361e0974a6/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf'
fatal error: bodhi_context: failed to load the model
exiting...

i thought it was due to failed allocated 8GB of memory, hence i freed memory allocation to cover 8GB. However even though i do have more than 8GB memory allocation that free, but it still have same error.

i am using M1 with 16GB, macos sonoma

Thank you for the help! and the Bodhi App is amazing

radityagumay commented 2 months ago

looks like similar to https://github.com/ggerganov/llama.cpp/pull/1817 and https://github.com/ggerganov/llama.cpp/issues/1815#issuecomment-1587074335

anagri commented 1 month ago

Thanks @radityagumay for reporting the issue.

Will keep an eye on the issue mentioned. Also work on giving better error message anticipating the issue.