vulkan: garbage output followed by GPU crash

llfw commented 4 weeks ago

hello,

i'm using:

koboldcpp 1c5e05e4771b183783e559dbcb46dfbb4bf1c275
FreeBSD 15.0/amd64, clang 18.1.5
AMD RX 6800 XT, FreeBSD drm-kmod 6.1.69, Mesa 24.0.7
LLaMA2-13B-Tiefighter.Q4_K_M.gguf (i'm happy to try other models, i just had this one lying around)
Vulkan backend, tested with both 40 and 80 layers offloaded

built with:

LLAMA_OPENBLAS = 1
LLAMA_CLBLAST  = 1
LLAMA_VULKAN   = 1

LDFLAGS = -L/usr/local/lib

the web interface starts and runs fine, but the model immediately produces garbage output (random binary strings) and after a couple of iterations, will eventually crash the GPU. i'm assuming the GPU crash is only a symptom of another problem.

this looks a bit like https://github.com/ggerganov/llama.cpp/issues/5179, but from what i can see the fix for that is already in koboldcpp.

using a CPU backend (e.g., OpenBLAS) works fine, aside from being very slow.

LostRuins commented 3 weeks ago

What about the vulkan backend with 0 layers offloaded?

llfw commented 3 weeks ago

so i did a bit more testing: 0 layers works fine, and a small number (around 5-10) also seems to work. increasing it much past 10 eventually triggers the problem. is this perhaps to do with running out of VRAM? the memory use figures that koboldcpp reported didn't seem very high (at least for a 16GB card) but i'm not sure how to find out how much VRAM is actually in use.

i also tested with the same hardware on Linux (Ubuntu 24.04 using the pre-compiled koboldcpp-nocuda) and i couldn't seem to trigger the problem there, even with 40 layers offloaded - but, interestingly, i could trigger the problem with lambda.cpp, even on Linux, when i compiled it myself. i wonder if this is something to do with the compiler optimisations in use? the CPU is a Ryzen 5800X3D (Zen 3 core).

as it's working on FreeBSD with fewer layers i'm happy with that, but if it is a VRAM issue, perhaps there's a way to fail gracefully rather than crashing.

LostRuins commented 3 weeks ago

Perhaps @0cc4m can take a look, especially since you mention it happens upstream too.

It working with a few layers but failing with more layers sounds odd but I doubt it's a compiler issue. Are you running oom or near oom?

LostRuins / koboldcpp

vulkan: garbage output followed by GPU crash #897