ggerganov / llama.cpp

LLM inference in C/C++
MIT License
60.95k stars 8.7k forks source link

Vulkan backend regression: gibberish output when layers offloaded to GPU #8092

Open Adriankhl opened 5 days ago

Adriankhl commented 5 days ago

What happened?

OS: Windows Compiler: cl or clang-cl Build command: cmake .. -GNinja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DLLAMA_NATIVE=OFF -DLLAMA_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug Apu: amd 780m Vulkan Instance Version: 1.3.261 Vulkan SDK version: 1.3.283

This PR https://github.com/ggerganov/llama.cpp/pull/7947 causes gibberish output when running

.\bin\llama-cli.exe -m "C:\Users\adriankhl\git\models\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf" --prompt "Hello world. " -ngl 33

while setting -ngl 0 produces normal output.

Name and Version

version: 3213 (52fc8705) built with Clang 18.1.6 for

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

stduhpf commented 4 days ago

Not happening on my end. Maybe you should give more details about the setup you're using (GPU model, driver version...)

Adriankhl commented 4 days ago

Not happening on my end. Maybe you should give more details about the setup you're using (GPU model, driver version...)

Added those information. It is an AMD apu, I feel like it is much more frequent to have problems on integrated GPU, probably because @0cc4m is testing on a different setting.