Open grtsata opened 3 weeks ago
Those quants are not supported on those backends. Try using a K quant
You are correct that the K_M and K_S models work fine. Is it due to NoAVX2 mode that XS models and others are not supported? It was difficult to identify the problem as the documentation does not mention such information. Also, they work fine in text-generation-webui-main, so they should be workable in hardware, and we hope you will consider supporting them.
Are you sure text-gen webui supports vulkan/clblast?
In text-generation-webui, the XS and XSS models are loaded on the GPU and work fine, even on the AVX1 CPU.
Please try this build: https://github.com/LostRuins/koboldcpp/actions/runs/9436096249/artifacts/1582996921 and select cublas.
I tried the build and it works with Preset cublas selected K and XS and XSS Model also works. It is working very well.
Yup there you go.
My CPU is old and does not support AVX2, so I am using the NoAVX2 preset. The q4_K_M model can read the GPU with no problem. For q4_XS, q3_XSS, etc., it crashes with the following message at the end
Preset: CLBlast NoAVX2 (Old CPU) GGML_ASSERT: ggml-opencl.cpp:1815: to_fp32_cl ! = nullptr
Preset: Vulcan NoAVX2(Old CPU) GGML_ASSERT: ggml-vulkan.cpp:2999: !qx_needs_dequant ||to_fp16_vk_0 ! = nullptr