Open PABannier opened 2 months ago
After it creates the tokens and runs ggml_metal_init
, I get this:
ggml_metal_init: GPU name: Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB
ggml_metal_init: maxTransferRate = built-in GPU
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 54.36 MB, ( 54.98 / 21845.34)
encodec_load_model_weights: model size = 44.36 MB
encodec_load_model: n_q = 32
ggml_metal_add_buffer: allocated 'backend ' buffer, size = 314.06 MB, ( 369.05 / 21845.34)
encodec_eval: compute buffer size: 314.05 MB
ggml_metal_graph_compute_block_invoke: error: node 0, op = REPEAT not implemented
GGML_ASSERT: /Users/siraben/Git/bark.cpp/encodec.cpp/ggml/src/ggml-metal.m:1428: false
ggml_metal_graph_compute_block_invoke: error: node 4677, op = MAP_CUSTOM2_F32 not implemented
[1] 9701 abort ./examples/main/main -ngl 100 -t 8 -m ./ggml_weights/ggml_weights.bin -em -p
Hello @siraben !
Indeed, it seems that some operations (e.g., repeat,
which is used to broadcast computations) do not have a corresponding Metal kernel implemented in ggml. I'll open a PR to implement them.
When I try to run cmake -DGGML_CUBLAS=ON ..
I get:
CMake Warning at encodec.cpp/ggml/src/CMakeLists.txt:219 (message):
cuBLAS not found
When I try to run
cmake -DGGML_CUBLAS=ON ..
I get:CMake Warning at encodec.cpp/ggml/src/CMakeLists.txt:219 (message): cuBLAS not found
I also tried CMAKE_ARGS='-DLLAMA_CUBLAS=on' cmake ..
and added all the changes proposed in this pull, but to no success.
This PR allows users to use the Metal (MacOS) and cuBLAS backend by:
n_gpu_layers
parameter in the CLI