Open SteelPh0enix opened 1 month ago
Small update; I've confirmed that this bug does not happen when using Vulkan as a backend.
Which version of the driver are you using? I encountered the same issue, but everything worked smoothly after I downgraded to 24.5.1.
Which version of the driver are you using? I encountered the same issue, but everything worked smoothly after I downgraded to 24.5.1.
Currently on 24.9.1 - and yeah, that might be it! I have pending update to 24.10.1, i'll see if it works - if not, i'll try downgrading.
EDIT: It's still loading to shared memory on 24.10.1, i'll downgrade the driver soon and verify whether it's the issue
does the issue happen with the koboldcpp-rocm fork? https://github.com/YellowRoseCx/koboldcpp-rocm
Same issue here, I got 24.8 to work but having performance degrading overtime. 24.5.1 tried give ROCM errors, trying 24.7 and it seems to work, will have to watch it. 24.9 and 24.10 do not work either
ye, maybe this is the reason I experienced extreme slow down on my 7900XTX after B3666
someone confirmed on the discussions forum that rolling back the runtime fixes the issue: https://github.com/ggerganov/llama.cpp/discussions/9960#discussioncomment-11141805
Discussed in https://github.com/ggerganov/llama.cpp/discussions/9960
After doing more testing, i've noticed two things:
First thing; i was quantizing models with --leave-output-tensor, which made my models run very slow under Linux too. That was a side-effect of my investigation, and i'm leaving that in case somebody else has this issue :)
Second thing, closely related to the issue: some models work just fine. In my first test, i was checking Qwen2.5 14B finetune, quantized to Q6_K. This model behaves the same on Windows whether i leave the output tensor as-is, or not, in terms of memory allocation. Not leaving output tensor increases performance a small bit, but it's not very noticeable due to memory constraints.
HOWEVER, LLaMA 3.2 3B quantized to Q8_0 works just fine, and is loaded to dedicated GPU memory! What's going on here?