Model in question: https://huggingface.co/bartowski/35b-beta-long-GGUF
The regular q_K quants from the same repo (tested with q4k_M) offload just fine into gpu. Tested with a 6900 xt using vulkan. None of the iq4 and iq3 quants would load for me with gpu offloading, but work just fine in cpu only inference (clblast).
EDIT - Here's the last message I see on screen before it crashes:
Model in question: https://huggingface.co/bartowski/35b-beta-long-GGUF The regular q_K quants from the same repo (tested with q4k_M) offload just fine into gpu. Tested with a 6900 xt using vulkan. None of the iq4 and iq3 quants would load for me with gpu offloading, but work just fine in cpu only inference (clblast).
EDIT - Here's the last message I see on screen before it crashes:
GGML_ASSERT: ggml-vulkan.cpp:2940: !qx_needs_dequant || to_fp16_vk_0 != nullptr