ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.21k stars 9.65k forks source link

Feature Request: Improve Gemma v2 model performance on Vulkan backend #8476

Closed lin72h closed 2 months ago

lin72h commented 3 months ago

Prerequisites

Feature Description

Hi team, First of all, I'm grateful you guys keep improving this awesome project. I just discovered that using Vulkan backend on Linux or FreeBSD using Mesa Vulkan driver, the performance for Gemma-2-9B model is 4X slower than Llama-3-8B model: here's the results:

./llama-bench -m ~/Models/llama-3-8b-it_q6_k.gguf -n 64 -p 512 -ngl 99
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 7900 XT (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | warp size: 64
| llama 8B Q6_K                  |   6.14 GiB |     8.03 B | Vulkan     |  99 |         pp512 |   612.28 ± 85.52 |
| llama 8B Q6_K                  |   6.14 GiB |     8.03 B | Vulkan     |  99 |          tg64 |     56.36 ± 0.92 |
build: 17eb6aa8 (3386)
me@bdw006:$ ./llama-bench -m ~/Models/gemma-2-9b-it_Q4_K_L.gguf -n 64 -p 512 -ngl 99 
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 7900 XT (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | warp size: 64
| gemma2 9B Q4_K - Medium        |   6.47 GiB |    10.16 B | Vulkan     |  99 |         pp512 |    134.82 ± 0.10 |
| gemma2 9B Q4_K - Medium        |   6.47 GiB |    10.16 B | Vulkan     |  99 |          tg64 |     17.47 ± 1.65 |

Here's my setup:

OS: FreeBSD-15-Current GPU Driver: drm-6.1-lts and mesa radv driver CPU: dual socket E5-2680v4 GPU: AMD 7900XT(20GB)

Motivation

Gemma-2 model is a high quality model for it's size. And vulkan backend optimization is very good addition

Possible Implementation

No response

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.