takis commented 10 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

Llama.cpp's 'main' executable has been working perfectly for my on my Macbook, but as I pulled in the current GIT changes, I got errors:

ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'

I used git bisect to find the commit that broke it on my system:

cafcd4f89500b8afef722cdb08088eceb8a22572 is the first bad commit
commit cafcd4f89500b8afef722cdb08088eceb8a22572
Author: slaren <slarengh@gmail.com>
Date:   Thu Dec 14 16:52:08 2023 +0100

    ggml : remove n_dims from ggml_tensor (#4469)

    ggml-ci

 common/train.cpp                                   | 18 +++--
 examples/baby-llama/baby-llama.cpp                 | 18 ++---
 .../convert-llama2c-to-ggml.cpp                    |  4 +-
 examples/finetune/finetune.cpp                     |  2 +-
 examples/gguf/gguf.cpp                             |  2 +-
 examples/llava/clip.cpp                            |  6 +-
 ggml.c                                             | 94 +++++++++++-----------
 ggml.h                                             |  8 +-
 llama.cpp                                          |  2 +-
 9 files changed, 81 insertions(+), 73 deletions(-)

I'm getting this error as of commit cafcd4f89500b8afef722cdb08088eceb8a22572:

llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 5.53 GiB (6.56 BPW) 
llm_load_print_meta: general.name     = mistralai_mistral-7b-v0.1
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: mem required  = 5666.20 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB
llama_build_graph: non-view tensors processed: 676/676
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/ic/external/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: max tensor size =   102.54 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  5666.80 MiB, ( 5668.42 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    64.03 MiB, ( 5732.45 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    73.02 MiB, ( 5805.47 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"

Environment and Context

I'm using a Macbook Pro using a M1 Pro running macOS 14.1.2.

clang --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
python3 --version
Python 3.11.7
llama.cpp$ git log | head -1
commit 6744dbe924a317e3e2a5a2a4a2037061b2223449

takis commented 10 months ago

Oh, and I'd like to add that the problem occurs with all models I tried: llama-2-13b-chat.Q4_K_M.gguf

llama_build_graph: non-view tensors processed: 844/844
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/ic/external/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 78.19 MiB
llama_new_context_with_model: max tensor size =   128.17 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7501.56 MiB, ( 7503.19 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   400.03 MiB, ( 7903.22 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    75.02 MiB, ( 7978.23 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'

mistral-7b-v0.1.Q6_K.gguf

ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: max tensor size =   102.54 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  5666.80 MiB, ( 5668.42 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    64.03 MiB, ( 5732.45 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    73.02 MiB, ( 5805.47 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"

neuralhermes-2.5-mistral-7b.Q5_K_M.gguf

ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: max tensor size =   102.55 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  4893.72 MiB, ( 4895.34 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    64.03 MiB, ( 4959.38 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    73.02 MiB, ( 5032.39 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"

rocket-3b.Q4_K_M.gguf:

llama_new_context_with_model: compute buffer total size = 111.44 MiB
llama_new_context_with_model: max tensor size =   100.74 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  1629.45 MiB, ( 1631.08 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   160.03 MiB, ( 1791.11 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   108.27 MiB, ( 1899.38 / 10922.67)
ggml_metal_get_buffer: error: buffer is nil
ggml_metal_get_buffer: error: buffer is nil
GGML_ASSERT: ggml-metal.m:1949: ne00 % 4 == 0
ggml_metal_get_buffer: error: buffer is nil

takis commented 10 months ago

Using the latest git code (6744dbe924a317e3e2a5a2a4a2037061b2223449), but with the specified patch reversed fixes the crashes for me:

➜  llama.cpp git:(master) ✗ git log | head -1
commit 6744dbe924a317e3e2a5a2a4a2037061b2223449
➜  llama.cpp git:(master) ✗ git show cafcd4f89500b8afef722cdb08088eceb8a22572|patch -p1 -R
➜  llama.cpp git:(master) ✗ make

slaren commented 10 months ago

I cannot reproduce this on a M3 Max, and the errors don't really make much sense, ggml_metal_supports_op returns true unconditionally for GGML_OP_ROPE.

slaren commented 10 months ago

Make sure to do a make clean before building.

takis commented 10 months ago

You were right! make clean fixed all crashes...

Apologies!

redblobgames commented 10 months ago

Thank you! I was having the same error with many different model files, and make clean solved it.

ggerganov / llama.cpp

ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE' on M1 Pro #4482

Prerequisites

Current Behavior

Environment and Context