Crash when loading GPU such as q_XS and q_XSS models (Preset NoAVX2)

LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.34k stars 310 forks source link

Crash when loading GPU such as q_XS and q_XSS models (Preset NoAVX2) #904

Open grtsata opened 3 weeks ago

grtsata commented 3 weeks ago

My CPU is old and does not support AVX2, so I am using the NoAVX2 preset. The q4_K_M model can read the GPU with no problem. For q4_XS, q3_XSS, etc., it crashes with the following message at the end

Preset: CLBlast NoAVX2 (Old CPU) GGML_ASSERT: ggml-opencl.cpp:1815: to_fp32_cl ! = nullptr

Preset: Vulcan NoAVX2(Old CPU) GGML_ASSERT: ggml-vulkan.cpp:2999: !qx_needs_dequant ||to_fp16_vk_0 ! = nullptr

LostRuins commented 3 weeks ago

Those quants are not supported on those backends. Try using a K quant

grtsata commented 3 weeks ago

You are correct that the K_M and K_S models work fine. Is it due to NoAVX2 mode that XS models and others are not supported? It was difficult to identify the problem as the documentation does not mention such information. Also, they work fine in text-generation-webui-main, so they should be workable in hardware, and we hope you will consider supporting them.

LostRuins commented 3 weeks ago

Are you sure text-gen webui supports vulkan/clblast?

grtsata commented 3 weeks ago

In text-generation-webui, the XS and XSS models are loaded on the GPU and work fine, even on the AVX1 CPU.

LostRuins commented 3 weeks ago

Please try this build: https://github.com/LostRuins/koboldcpp/actions/runs/9436096249/artifacts/1582996921 and select cublas.

grtsata commented 2 weeks ago

I tried the build and it works with Preset cublas selected K and XS and XSS Model also works. It is working very well.

LostRuins commented 2 weeks ago

Yup there you go.