CUDA kernel vec_dot_q4_K_q8_1_impl_vmmq has no device code compatible with CUDA arch 600

I tried running 3 llamafiles with quanitization, always ends with an error. However the TinyLlama-1.1B-Chat-v1.0.F16.llamafile runs with GPU without error

./llava-v1.5-7b-q4.llamafile --server --nobrowser --embedding -ngl 1000 --gpu nvidia ggml-cuda.cu:1412: ERROR: CUDA kernel vec_dot_q4_K_q8_1_impl_vmmq has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 500,600,700,800,900 ./mistral-7b-instruct-v0.2.Q4_K_M.llamafile --server --nobrowser --embedding -ngl 1000 --gpu nvidia ggml-cuda.cu:1412: ERROR: CUDA kernel vec_dot_q4_K_q8_1_impl_vmmq has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 500,600,700,800,900

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2671 G /usr/lib/xorg/Xorg 210MiB | | 0 N/A N/A 2881 G /usr/bin/gnome-shell 44MiB | | 0 N/A N/A 6153 G /usr/lib/firefox/firefox-bin 66MiB | | 0 N/A N/A 6672 G ...91,262144 --variations-seed-version 87MiB | +-----------------------------------------------------------------------------------------+

@coder-vig ... were you able to resolve this? I'm facing the same issue.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro P5000                 WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   63C    P8             10W /  100W |    2227MiB /  16384MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:36:51_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

Mozilla-Ocho / llamafile

CUDA kernel vec_dot_q4_K_q8_1_impl_vmmq has no device code compatible with CUDA arch 600 #434