Open Nekotekina opened 2 months ago
Oops, I'll retest master branch.
Retested with latest version, the same result.
Potentially related to issue 8760 which also mentions the difference between (IQ1, IQ2, IQ3) and (IQ4 / K)
On NVidia (3090), IQ3_M is faster than IQ4_XS (~40t/s against ~35t/s)
But, On 1x NVIDIA 3090 (DDR4-offload), IQ3_S and IQ3_M are slower than IQ4_XS (about 0.5x speed) I seem that Only NVIDIA can deal IQ3 with highspeed.
What happened?
Model: https://huggingface.co/bartowski/gemma-2-27b-it-GGUF AMD GPU: RX 7600 XT + RX 7600 (full offload) With IQ3_M I get about 10 t/s when IQ4_XS is nearly 15 t/s. I thought smaller models would run faster due to lessened memory bandwidth, and they are both IQ.
Name and Version
version: 3827 (7691654c) built with Ubuntu clang version 14.0.6-1~kisak1~j for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response