Open coder-vig opened 1 month ago
@coder-vig ... were you able to resolve this? I'm facing the same issue.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro P5000 WDDM | 00000000:01:00.0 On | N/A |
| N/A 63C P8 10W / 100W | 2227MiB / 16384MiB | 3% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:36:51_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
I tried running 3 llamafiles with quanitization, always ends with an error. However the TinyLlama-1.1B-Chat-v1.0.F16.llamafile runs with GPU without error
./llava-v1.5-7b-q4.llamafile --server --nobrowser --embedding -ngl 1000 --gpu nvidia
ggml-cuda.cu:1412: ERROR: CUDA kernel vec_dot_q4_K_q8_1_impl_vmmq has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 500,600,700,800,900./mistral-7b-instruct-v0.2.Q4_K_M.llamafile --server --nobrowser --embedding -ngl 1000 --gpu nvidia
ggml-cuda.cu:1412: ERROR: CUDA kernel vec_dot_q4_K_q8_1_impl_vmmq has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 500,600,700,800,900+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A | | N/A 49C P8 13W / 125W | 413MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2671 G /usr/lib/xorg/Xorg 210MiB | | 0 N/A N/A 2881 G /usr/bin/gnome-shell 44MiB | | 0 N/A N/A 6153 G /usr/lib/firefox/firefox-bin 66MiB | | 0 N/A N/A 6672 G ...91,262144 --variations-seed-version 87MiB | +-----------------------------------------------------------------------------------------+