ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.21k stars 9.64k forks source link

ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE' on M1 Pro #4482

Closed takis closed 10 months ago

takis commented 10 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Current Behavior

Llama.cpp's 'main' executable has been working perfectly for my on my Macbook, but as I pulled in the current GIT changes, I got errors:

ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'

I used git bisect to find the commit that broke it on my system:

cafcd4f89500b8afef722cdb08088eceb8a22572 is the first bad commit
commit cafcd4f89500b8afef722cdb08088eceb8a22572
Author: slaren <slarengh@gmail.com>
Date:   Thu Dec 14 16:52:08 2023 +0100

    ggml : remove n_dims from ggml_tensor (#4469)

    ggml-ci

 common/train.cpp                                   | 18 +++--
 examples/baby-llama/baby-llama.cpp                 | 18 ++---
 .../convert-llama2c-to-ggml.cpp                    |  4 +-
 examples/finetune/finetune.cpp                     |  2 +-
 examples/gguf/gguf.cpp                             |  2 +-
 examples/llava/clip.cpp                            |  6 +-
 ggml.c                                             | 94 +++++++++++-----------
 ggml.h                                             |  8 +-
 llama.cpp                                          |  2 +-
 9 files changed, 81 insertions(+), 73 deletions(-)

I'm getting this error as of commit cafcd4f89500b8afef722cdb08088eceb8a22572:

llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 5.53 GiB (6.56 BPW) 
llm_load_print_meta: general.name     = mistralai_mistral-7b-v0.1
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: mem required  = 5666.20 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB
llama_build_graph: non-view tensors processed: 676/676
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/ic/external/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: max tensor size =   102.54 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  5666.80 MiB, ( 5668.42 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    64.03 MiB, ( 5732.45 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    73.02 MiB, ( 5805.47 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"

Environment and Context

I'm using a Macbook Pro using a M1 Pro running macOS 14.1.2.

clang --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
python3 --version
Python 3.11.7
llama.cpp$ git log | head -1
commit 6744dbe924a317e3e2a5a2a4a2037061b2223449
takis commented 10 months ago

Oh, and I'd like to add that the problem occurs with all models I tried: llama-2-13b-chat.Q4_K_M.gguf

llama_build_graph: non-view tensors processed: 844/844
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/ic/external/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 78.19 MiB
llama_new_context_with_model: max tensor size =   128.17 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7501.56 MiB, ( 7503.19 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   400.03 MiB, ( 7903.22 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    75.02 MiB, ( 7978.23 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'

mistral-7b-v0.1.Q6_K.gguf

ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: max tensor size =   102.54 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  5666.80 MiB, ( 5668.42 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    64.03 MiB, ( 5732.45 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    73.02 MiB, ( 5805.47 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"

neuralhermes-2.5-mistral-7b.Q5_K_M.gguf

ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: max tensor size =   102.55 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  4893.72 MiB, ( 4895.34 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    64.03 MiB, ( 4959.38 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    73.02 MiB, ( 5032.39 / 10922.67)
ggml_metal_graph_compute_block_invoke: error: unsupported op 'ROPE'
GGML_ASSERT: ggml-metal.m:1009: !"unsupported op"

rocket-3b.Q4_K_M.gguf:

llama_new_context_with_model: compute buffer total size = 111.44 MiB
llama_new_context_with_model: max tensor size =   100.74 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  1629.45 MiB, ( 1631.08 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   160.03 MiB, ( 1791.11 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   108.27 MiB, ( 1899.38 / 10922.67)
ggml_metal_get_buffer: error: buffer is nil
ggml_metal_get_buffer: error: buffer is nil
GGML_ASSERT: ggml-metal.m:1949: ne00 % 4 == 0
ggml_metal_get_buffer: error: buffer is nil
takis commented 10 months ago

Using the latest git code (6744dbe924a317e3e2a5a2a4a2037061b2223449), but with the specified patch reversed fixes the crashes for me:

➜  llama.cpp git:(master) ✗ git log | head -1
commit 6744dbe924a317e3e2a5a2a4a2037061b2223449
➜  llama.cpp git:(master) ✗ git show cafcd4f89500b8afef722cdb08088eceb8a22572|patch -p1 -R
➜  llama.cpp git:(master) ✗ make
slaren commented 10 months ago

I cannot reproduce this on a M3 Max, and the errors don't really make much sense, ggml_metal_supports_op returns true unconditionally for GGML_OP_ROPE.

slaren commented 10 months ago

Make sure to do a make clean before building.

takis commented 10 months ago

You were right! make clean fixed all crashes...

Apologies!

redblobgames commented 10 months ago

Thank you! I was having the same error with many different model files, and make clean solved it.