ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.95k stars 9.75k forks source link

Bug: rwkv and mamba models cannot be used with `-ngl 0` after CPU backend refactor #10351

Closed MollySophia closed 53 minutes ago

MollySophia commented 9 hours ago

What happened?

$ ./build/bin/llama-bench -m ~/Downloads/mamba-2.8b-q4_0.gguf -ngl 0
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
/Users/molly/llama.cpp/ggml/src/ggml-backend.cpp:745: pre-allocated tensor in a backend that cannot run the operation
[1]    13345 abort      ./build/bin/llama-bench -m ~/Downloads/mamba-2.8b-q4_0.gguf -ngl 0
$ ./build/bin/llama-bench -m /Volumes/grouped/Models/rwkv/v6-Finch-7B-HF/v6-Finch-7B-HF-Q4_0.gguf -ngl 0
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
/Users/molly/llama.cpp/ggml/src/ggml-backend.cpp:745: pre-allocated tensor in a backend that cannot run the operation
[1]    16003 abort      ./build/bin/llama-bench -m  -ngl 0

Using lldb to trace the error, it fails in ggml_backend_sched_backend_id_from_cur

    if (tensor->buffer || (tensor->view_src && tensor->view_src->buffer)) {
        // since the tensor is pre-allocated, it cannot be moved to another backend
        GGML_ABORT("pre-allocated tensor in a backend that cannot run the operation");
    }

, where the tensor triggering this fault was a view of cache_k_l0. This makes sense, as both mamba and rwkv do a GGML_VIEW/GGML_RESHAPE on the k cache when building the graph.

CC @compilade

Name and Version

Non-working version: $ ./build/bin/llama-cli -v build: 4098 (772703c8) with Apple clang version 16.0.0 (clang-1600.0.26.4) for arm64-apple-darwin24.1.0

Known working version: $ ./build/bin/llama-cli -v build: 4079 (4a8ccb37) with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Linux, Mac

Relevant log output

No response

slaren commented 2 hours ago

Building with GGML_CPU_AARCH64 disabled should fix it for now.