Isaakkamau commented 6 months ago

(yuna2) (base) adm@Adms-MacBook-Pro yuna-ai % python index.py ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:3:10: fatal error: 'ggml-common.h' file not found

include "ggml-common.h"

     ^~~~~~~~~~~~~~~

" UserInfo={NSLocalizedDescription=program_source:3:10: fatal error: 'ggml-common.h' file not found

include "ggml-common.h"

     ^~~~~~~~~~~~~~~

} llama_new_context_with_model: failed to initialize Metal backend Traceback (most recent call last): File "/Users/adm/Desktop/yuna-ai/index.py", line 171, in yuna_server = YunaServer() ^^^^^^^^^^^^ File "/Users/adm/Desktop/yuna-ai/index.py", line 33, in init self.chat_generator = ChatGenerator(self.config) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/adm/Desktop/yuna-ai/lib/generate.py", line 11, in init self.model = Llama( ^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_cpp/llama.py", line 328, in init self._ctx = _LlamaContext( ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_cpp/_internals.py", line 265, in init raise ValueError("Failed to create llama_context") ValueError: Failed to create llama_context

JackyCCK2126 commented 6 months ago

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: https://github.com/abetlen/llama-cpp-python/issues/1285#issuecomment-2007778703

yukiarimo commented 6 months ago

Hey, @JackyCCK2126! I'm having the same issue, and now it works! Thanks, but what was the problem with?

Also, is there any workaround to speed up the generation on the M1?

JackyCCK2126 commented 6 months ago

@yukiarimo I don't know much about M1. But in general, you can offload more layers in GPU and lower the context size when initializing the LLama class by setting n_gpt_layers and n_ctx. (top_p and top_k may also affect a bit of speed) If it is still too slow, you can choose a smaller model.

However, if your prompt is not too long, it should have around 7 to 12 tokens per second, which is somehow acceptable for me.

JackyCCK2126 commented 6 months ago

@yukiarimo If you found a speed-up solution, please let me know. XD

Spider-netizen commented 5 months ago

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: #1285 (comment)

Doesn't seem to solve it for me... Do you happen to know if I'm missing something?

That's the end of the traceback:

ScofieldYeh commented 4 months ago

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: #1285 (comment)

Doesn't seem to solve it for me... Do you happen to know if I'm missing something?

That's the end of the traceback:

I have the same result too. It failed to create llama_context. I was wondering to why need to set DLLAMA_METAL=on? I think METAL is for Macbook but I was running llama.cpp on Windows PC.

JackyCCK2126 commented 4 months ago

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: #1285 (comment)

Doesn't seem to solve it for me... Do you happen to know if I'm missing something? That's the end of the traceback:

I have the same result too. It failed to create llama_context. I was wondering to why need to set DLLAMA_METAL=on? I think METAL is for Macbook but I was running llama.cpp on Windows PC.

Yes. Metal is only for Apple's products.

abetlen / llama-cpp-python

ValueError: Failed to create llama_context #1304

include "ggml-common.h"

include "ggml-common.h"