Closed nicholaiTukanov closed 2 months ago
Hi all! Amazing work on llama.cpp!
I am an engineer from NVIDIA working on NVPL BLAS (BLAS library designed for NVIDIA Grace CPU).
I would like to add NVPL BLAS as a build option in the Makefile and ggml-blas.cpp. I have found it to provide better performance over GGML for less than 32 threads when using the prompt test from llama-bench from version b3322.
Makefile
ggml-blas.cpp
llama-bench
My changes can be found here https://github.com/nicholaiTukanov/llama.cpp/tree/ntukanov/add-nvpl. Please let me know if there is anything else I need to do to get this approved. Thank you!
This will provide better prompt performance for aarch64 users. See table in issue.
aarch64
GGML_NVPL
NVPL_ENABLE_CBLAS
nvpl_blas.h
nvpl_blas_set_num_threads()
Seems OK to add - feel free to open PR
Closing since #8425 has been merged. Thank you all for your help.
Prerequisites
Feature Description
Hi all! Amazing work on llama.cpp!
I am an engineer from NVIDIA working on NVPL BLAS (BLAS library designed for NVIDIA Grace CPU).
I would like to add NVPL BLAS as a build option in the
Makefile
andggml-blas.cpp
. I have found it to provide better performance over GGML for less than 32 threads when using the prompt test fromllama-bench
from version b3322.My changes can be found here https://github.com/nicholaiTukanov/llama.cpp/tree/ntukanov/add-nvpl. Please let me know if there is anything else I need to do to get this approved. Thank you!
Motivation
This will provide better prompt performance for
aarch64
users. See table in issue.Possible Implementation
GGML_NVPL
build option intoMakefile
NVPL_ENABLE_CBLAS
code path intoggml-blas.cpp
nvpl_blas.h
and sets the number of threads for NVPL BLAS usingnvpl_blas_set_num_threads()