Closed Bateoriginal closed 7 months ago
Same here. The only workaround is to not use METAL for me:
ggml_metal_init: allocating
ggml_metal_init: found device: Intel(R) HD Graphics 530
ggml_metal_init: found device: AMD Radeon Pro 455
ggml_metal_init: picking default device: AMD Radeon Pro 455
Same.. My solution so far is to use -ngl 0
ggml_metal_init: allocating
ggml_metal_init: found device: AMD Radeon Pro 575
ggml_metal_init: picking default device: AMD Radeon Pro 575
# ...
gml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x7feed820ff50 | th_max = 1024 | th_width = 64
ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x0 | th_max = 0 | th_width = 0
ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "SC compilation failure
There is a call to an undefined label" UserInfo={NSLocalizedDescription=SC compilation failure
There is a call to an undefined label}
llama_new_context_with_model: ggml_metal_init() failed
llama_init_from_gpt_params: error: failed to create context with model '/Volumes/ext1tb/Models/llama-2-7b-chat-codeCherryPop.Q5_K_M.gguf'
{"timestamp":1694485384,"level":"ERROR","function":"loadModel","line":265,"message":"unable to load model","model":"/Volumes/ext1tb/Models/llama-2-7b-chat-codeCherryPop.Q5_K_M.gguf"}
Same.. My solution so far is to use -ngl 0
ggml_metal_init: allocating ggml_metal_init: found device: AMD Radeon Pro 575 ggml_metal_init: picking default device: AMD Radeon Pro 575 # ... gml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x7feed820ff50 | th_max = 1024 | th_width = 64 ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x0 | th_max = 0 | th_width = 0 ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "SC compilation failure There is a call to an undefined label" UserInfo={NSLocalizedDescription=SC compilation failure There is a call to an undefined label} llama_new_context_with_model: ggml_metal_init() failed llama_init_from_gpt_params: error: failed to create context with model '/Volumes/ext1tb/Models/llama-2-7b-chat-codeCherryPop.Q5_K_M.gguf' {"timestamp":1694485384,"level":"ERROR","function":"loadModel","line":265,"message":"unable to load model","model":"/Volumes/ext1tb/Models/llama-2-7b-chat-codeCherryPop.Q5_K_M.gguf"}
Seeing same thing, my env:
ggml_metal_init: allocating
ggml_metal_init: found device: Intel(R) UHD Graphics 630
ggml_metal_init: found device: AMD Radeon Pro 5500M
ggml_metal_init: picking default device: AMD Radeon Pro 5500M
ggml_metal_init: loading '/Users/ssainz/Projects/ggerganov/llama.cpp/ggml-metal.metal'
Same here, I think its a bug with Macbooks with an AMD Radeon GPU
@mounta11n can you show an example of what you mean by -ngl 0
?
For example ./server -ngl 0 -t 3 --host 0.0.0.0 -c 4096 -b 2048 --mlock -m /Volumes/ext1tb/Models/13B/Synthia-13B-q4M.gguf
If you set ngl to zero, you say that no layer should be offloaded to the gpu. So -ngl 0 means that you don't utilize the gpu.
And yes, I think it's an issue with Macs and AMD (not only MacBooks, since I have an iMac 5k 2017)
I found even forcing ggml-metal.m to use integrated graphics the issue persisted.
I'm having a similar issue using an Intel MacMini and AMD Radeon RX Vega EGPU.
It's persistent across server Llama Models.
I'm about to try the same process on an M2.
UserInfo={NSLocalizedDescription=SC compilation failure
There is a call to an undefined label
llama_new_context_with_model: ggml_metal_init() failed
llama_init_from_gpt_params: error: failed to create context with model './models/7B/ggml-model-q4_0.bin'
main: error: unable to load model
@RobinWinters @nchudleigh @ro8inmorgan @ssainz @pkrmf @Bateoriginal
if you guys are still interested, i have found an acceptable workaround that will allow you to utilize your gpu and let you offload layers to it.
make clean
brew update && brew install clblast
make LLAMA_CLBLAST=1 LLAMA_NO_METAL=1
./main -s 1 -m /Volumes/ext1tb/Models/13B/Samantha.gguf -p "I believe the purpose of life is" --ignore-eos -c 64 -n 128 -t 3 -ngl 10
some of you should certainly benefit from layer offloading. in my case offloading layers doesnt really give me any benefits, since my gpu (radeon pro 575) is about as fast as my cpu (fyi: i have tried offloading everything between 1 and 22 layers). the other aspect is the 3 gb vram extra memory – but this isnt relevant as well for me since i have enough cpu ram. but the loading time is about 20x faster now thanks to clblast:
Without clBlast -t 3
llama_print_timings: load time = 17314,26 ms
llama_print_timings: sample time = 86,12 ms / 128 runs ( 0,67 ms per token, 1486,28 tokens per second)
llama_print_timings: prompt eval time = 956,49 ms / 8 tokens ( 119,56 ms per token, 8,36 tokens per second)
llama_print_timings: eval time = 26066,80 ms / 127 runs ( 205,25 ms per token, 4,87 tokens per second)
it needs 17 seconds until first token
With clBlast -t 3 -ngl 0
llama_print_timings: load time = 920,24 ms
llama_print_timings: sample time = 74,38 ms / 128 runs ( 0,58 ms per token, 1721,01 tokens per second)
llama_print_timings: prompt eval time = 1086,07 ms / 8 tokens ( 135,76 ms per token, 7,37 tokens per second)
llama_print_timings: eval time = 25951,83 ms / 127 runs ( 204,35 ms per token, 4,89 tokens per second)
llama_print_timings: total time = 27143,61 ms
now it needs under 1 second until first token, and even a little bit faster with mlock:
-t 3 -ngl 0 --mlock
ngl 0 t 3 mlock
llama_print_timings: load time = 858,57 ms
llama_print_timings: sample time = 74,57 ms / 128 runs ( 0,58 ms per token, 1716,44 tokens per second)
llama_print_timings: prompt eval time = 982,50 ms / 8 tokens ( 122,81 ms per token, 8,14 tokens per second)
llama_print_timings: eval time = 25761,31 ms / 127 runs ( 202,84 ms per token, 4,93 tokens per second)
llama_print_timings: total time = 26850,29 ms
about 860 ms until first token
Couldn't get it all to work, but I've been using llama_cpp python.
2.3 GHz 8-Core Intel i9 AMD Radeon Pro 5600M 8GB Intel UHD Graphics 630 1536 MB Memory: 16 GB 2667 Mhz DDR4
rewrote comment as I made a boo boo.
brew reinstall --build-from-source
Had to update Xcode command line tools.
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_METAL=off -DLLAMA_CLBLAST=on -DCLBlast_DIR=/usr/local/Cellar/clblast/1.6.1" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
Fails to install wheel packages...error:
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [44 lines of output]
*** scikit-build-core 0.5.1 using CMake 3.27.5 (wheel)
*** Configuring CMake...
2023-09-24 21:31:44,445 - scikit_build_core - WARNING - libdir/ldlibrary: /Users/zacharykolansky/miniforge3/lib/libpython3.10.a is not a real file!
2023-09-24 21:31:44,445 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/Users/zacharykolansky/miniforge3/lib, ldlibrary=libpython3.10.a, multiarch=darwin, masd=None
loading initial cache file /var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/tmphi4m_4u4/build/CMakeInit.txt
-- The C compiler identification is AppleClang 14.0.0.14000029
-- The CXX compiler identification is AppleClang 14.0.0.14000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.37.1 (Apple Git-137.1)")
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
CMake Warning at vendor/llama.cpp/CMakeLists.txt:125 (message):
Git repository not found; to enable automatic generation of build info,
make sure Git is installed and the project is a Git repository.
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Accelerate framework found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.0.sdk/usr/lib/libblas.tbd
-- BLAS found, Libraries: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.0.sdk/usr/lib/libblas.tbd
CMake Error at /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
Call Stack (most recent call first):
/private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
/private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPkgConfig.cmake:99 (find_package_handle_standard_args)
vendor/llama.cpp/CMakeLists.txt:212 (find_package)
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
Not sure where to begin to resolve it all...
EDIT
- now you can run a new build, but disable metal and enable clblast
make LLAMA_CLBLAST=1 LLAMA_NO_METAL=1
Yea, this worked from ./main, not from llama_cpp python bindings which keeps giving errors relating to metal. Maybe I'll submit an issue after more testing... I.e gpu working from ./main but not from llama_cpp
Couldn't get it all to work, but I've been using llama_cpp python.
2.3 GHz 8-Core Intel i9 AMD Radeon Pro 5600M 8GB Intel UHD Graphics 630 1536 MB Memory: 16 GB 2667 Mhz DDR4
rewrote comment as I made a boo boo.
brew reinstall --build-from-source
Had to update Xcode command line tools.
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_METAL=off -DLLAMA_CLBLAST=on -DCLBlast_DIR=/usr/local/Cellar/clblast/1.6.1" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
Fails to install wheel packages...error:
Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [44 lines of output] *** scikit-build-core 0.5.1 using CMake 3.27.5 (wheel) *** Configuring CMake... 2023-09-24 21:31:44,445 - scikit_build_core - WARNING - libdir/ldlibrary: /Users/zacharykolansky/miniforge3/lib/libpython3.10.a is not a real file! 2023-09-24 21:31:44,445 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/Users/zacharykolansky/miniforge3/lib, ldlibrary=libpython3.10.a, multiarch=darwin, masd=None loading initial cache file /var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/tmphi4m_4u4/build/CMakeInit.txt -- The C compiler identification is AppleClang 14.0.0.14000029 -- The CXX compiler identification is AppleClang 14.0.0.14000029 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.37.1 (Apple Git-137.1)") fatal: not a git repository (or any of the parent directories): .git fatal: not a git repository (or any of the parent directories): .git CMake Warning at vendor/llama.cpp/CMakeLists.txt:125 (message): Git repository not found; to enable automatic generation of build info, make sure Git is installed and the project is a Git repository. -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Accelerate framework found -- Looking for sgemm_ -- Looking for sgemm_ - found -- Found BLAS: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.0.sdk/usr/lib/libblas.tbd -- BLAS found, Libraries: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.0.sdk/usr/lib/libblas.tbd CMake Error at /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE) Call Stack (most recent call first): /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPkgConfig.cmake:99 (find_package_handle_standard_args) vendor/llama.cpp/CMakeLists.txt:212 (find_package) -- Configuring incomplete, errors occurred! *** CMake configuration failed [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
Not sure where to begin to resolve it all...
I could not get llama_cpp python to work so far as well but I was able to build and use llama_cpp without metal and with cblast simply following @mounta11n instructions.
An even simpler solution, I used -ngl 0 with llama_cpp that was built with metal and it worked fine (no need to remake unless you want GPU acceleration).
./main -m ./models/llama-2-13b-chat.Q4_0.gguf \
--color \
--ctx_size 2048 \
-n -1 \
-ins -b 256 \
--top_k 10000 \
--temp 0.2 \
--repeat_penalty 1.1 \
-t 8 \
-ngl 0
For me GPU is actually slowing down everything except model load and user input token eval, but I am still experimenting with various values of offloading :-) (10 to 20 so far)
./main -m ./models/llama-2-13b-chat.Q4_0.gguf \
--color \
--ctx_size 2048 \
-n -1 \
-ins -b 256 \
--top_k 10000 \
--temp 0.2 \
--repeat_penalty 1.1 \
-t 8 \
-ngl 10
Hopes, this helps, looking forward to see how you make python version works.
@vainceha may I ask what hardware do you use and give you some general advices?
I assume that you have an 8 core cpu, right? If so, it is very recommended to set -t to maximum 7, or even -t 4 is often much faster.
@vainceha may I ask what hardware do you use and give you some general advices?
I assume that you have an 8 core cpu, right? If so, it is very recommended to set -t to maximum 7, or even -t 4 is often much faster. 2. I am not sure if --top-k 1000 would mean more calculation, but beside that it is like never necessary to go further than 10 or so, especially when you have -temp 0.2 3. it looks like you are using the old quantization method (q4_0). If you have a good reason for it, then okay. But just in case you are not aware of, there is a new quantization method (k_quant), which is highly recommended by the devs of llama.cpp
That would be great, here is some data and more info
The testing I was doing over weekend was on this system
Quad-Core Intel Core i7/ 2.6 GHz with 16 GB RAM Radeon Pro 450M with 2GB, Metal GPUFamily macOS 2
I will be doing future testing on relatively newer machine with below specs
8 Core Intel i9/2.3 GHz with 16 GB RAM Radeon Pro 550M with 4GB with metal 3 support
Unfortunately I can only give you personal recommendations based on my own trial and error experience. The llama.cpp documentation itself is not easy to keep track of, I guess that's the reason why there is not much else to find on the internet at the moment. At least I don't know of any other good references at the moment.
But this is not meant to be a criticism of the llama.cpp team, because one also have to remember that this is absolute bleeding edge technology that is developing incredibly fast. If I would be such a skilled developer like the guys from llama.cpp and I would understand everything at once as soon as I see the code, then my time would probably be too precious to write simple manuals and documentations as well ^^'. Okay, enough monological smalltalk done, sorry.
These seem to be both MacBooks. You can't upgrade RAM unfortunately, too bad. With the Quad i7 you should not address more than 3 threads, so -t 3. With that you should get the fastest results in most cases. That's because you always need a "reserve" core, which orchestrates the rest and remains for the work of the system.
About top-k: with this value you specify, for each word that should be generated next (appropriately token but we say word now), how big the "pot" of words should be from which the next word should be randomly selected. This means concretely for top-k 1000, that each time, after each word, something should be picked out of 1000 possible words. But with LLMs it is similar to us humans and our brain. When we speak, we almost always have a 100% idea of what the next word should be. Sometimes we still think very briefly whether we want to take wording A or wording B. Sometimes we might even use wording C.
e.g. if I want to say "Because of this event I am quite.... 1. disappointed... 2. sad... 3. heartsick", then I am actually already relatively undecided. But I will never be so indecisive that I have to look at 1000 words before I can decide. That's why, in my opinion, it's quite sufficient to take a maximum of --top-k 3.
Then it's a matter of how "wildly" to decide between those words. If I am someone who prefers a conservative way of thinking and speaking, then I will almost certainly choose the most common word, in my case "disappointed" and rarely or never would I venture something exotic and say something like "heartsick" in the appropriate sentence. This corresponds roughly to a setting of -temp 0.2 With -temp 0.2 the first word is taken in most cases anyway, sometimes the second and rarely the third. So 997 words were considered unnecessarily with --top-k 1000.
My personal approach is actually always to take --top-k 1, because that shows me the true core of a particular language model and leaves nothing to chance. I hope this helps in understanding and setting these hyperparameters.
Yes, it is definitely worth trying the new quants. Quantization is something like compression. Q_4 means that the parameters of the model have been "compressed" to 4-bit. In Q4_K_M, most layers of the model are in 4-bit, but some layers that have certain key functions are quantized to 6-bit, giving better and smarter results than their q4_0 siblings.
Your i9 machine is a great device! You probably won't need GPU layer offloading here either. However, make sure to always leave at least one core here as well. So take a maximum of -t 7
Try building with latest master
- #3524 might fix the issue for some Intel Macbooks
Try building with latest
master
- #3524 might fix the issue for some Intel Macbooks
Tested today with latest master 95bd60a0a69f57e9a2ff1269667ea484a1a9bb40 on a Intel Macbook with AMD: it doesn't crash now but the performance with -ngl
option enabled is worst than CPU only. GPU spikes at 100% and token throughput is really slow (even with only 1 layer offloaded).
Couldn't get it all to work, but I've been using llama_cpp python.
2.3 GHz 8-Core Intel i9 AMD Radeon Pro 5600M 8GB Intel UHD Graphics 630 1536 MB Memory: 16 GB 2667 Mhz DDR4
rewrote comment as I made a boo boo.
brew reinstall --build-from-source
Had to update Xcode command line tools.
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_METAL=off -DLLAMA_CLBLAST=on -DCLBlast_DIR=/usr/local/Cellar/clblast/1.6.1" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
Fails to install wheel packages...error:
Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [44 lines of output] *** scikit-build-core 0.5.1 using CMake 3.27.5 (wheel) *** Configuring CMake... 2023-09-24 21:31:44,445 - scikit_build_core - WARNING - libdir/ldlibrary: /Users/zacharykolansky/miniforge3/lib/libpython3.10.a is not a real file! 2023-09-24 21:31:44,445 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/Users/zacharykolansky/miniforge3/lib, ldlibrary=libpython3.10.a, multiarch=darwin, masd=None loading initial cache file /var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/tmphi4m_4u4/build/CMakeInit.txt -- The C compiler identification is AppleClang 14.0.0.14000029 -- The CXX compiler identification is AppleClang 14.0.0.14000029 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.37.1 (Apple Git-137.1)") fatal: not a git repository (or any of the parent directories): .git fatal: not a git repository (or any of the parent directories): .git CMake Warning at vendor/llama.cpp/CMakeLists.txt:125 (message): Git repository not found; to enable automatic generation of build info, make sure Git is installed and the project is a Git repository. -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Accelerate framework found -- Looking for sgemm_ -- Looking for sgemm_ - found -- Found BLAS: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.0.sdk/usr/lib/libblas.tbd -- BLAS found, Libraries: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.0.sdk/usr/lib/libblas.tbd CMake Error at /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE) Call Stack (most recent call first): /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) /private/var/folders/69/x1sqq6g550176x2n0p_t85b80000gn/T/pip-build-env-5inno2pz/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/FindPkgConfig.cmake:99 (find_package_handle_standard_args) vendor/llama.cpp/CMakeLists.txt:212 (find_package) -- Configuring incomplete, errors occurred! *** CMake configuration failed [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
Not sure where to begin to resolve it all...
EDIT
- now you can run a new build, but disable metal and enable clblast
make LLAMA_CLBLAST=1 LLAMA_NO_METAL=1
Yea, this worked from ./main, not from llama_cpp python bindings which keeps giving errors relating to metal. Maybe I'll submit an issue after more testing... I.e gpu working from ./main but not from llama_cpp
Try
CMAKE_ARGS="-DLLAMA_CLBLAST=on -DLLAMA_METAL=off" pip install llama-cpp-python --no-cache-dir --force-reinstall
This issue was closed because it has been inactive for 14 days since being marked as stale.
Issue: Error when loading model on MacBook Pro with Intel Core i7 and Intel Iris Plus
System Information:
Steps to Reproduce:
wget https://huggingface.co/substratusai/Llama-2-13B-chat-GGUF/resolve/main/model.bin -O model.q4_k_s.gguf
./main -t 4 -m ./models/model.q4_k_s.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story\n### Response:"
Error Message:
I would appreciate any guidance or advice on how to resolve this issue. Thank you!