Using BLIS or MKL - Githubissues

gemiduck commented 1 year ago

Hi, is it possible to use either BLIS or MKL instead of OpenBLAS? I'm using a AMD EPYC 7543 and the performance without it is much faster, so I'm wondering if either of the two would help prompt eval time.

python3 koboldcpp.py --threads 4 --smartcontext ../model_13b.bin

Without OpenBLAS:

Processing Prompt (180 / 180 tokens)
Generating (15 / 110 tokens)
Time Taken - Processing:22.3s (124ms/T), Generation:2.7s (180ms/T), Total:25.0s (0.6T/s)

With BLAS:

Processing Prompt [BLAS] (46 / 46 tokens)
Generating (33 / 110 tokens)
Time Taken - Processing:14.5s (315ms/T), Generation:6.2s (187ms/T), Total:20.6s (1.6T/s)

I'm wondering whether this is a multi-thread issue, as the timing I get with one BLAS thread is comparable, but still slightly higher.

LostRuins commented 1 year ago

I tried getting BLIS to work previously, but it wouldn't compile correctly on my system for some reason. Anecdotally, from what I've read MKL should be faster than OpenBLAS, but a lot of it appears to be proprietary. You could certainly try swapping the BLAS libraries with BLIS/MKL if you can, assuming your libraries are compiled and installed correctly it should be only a few lines changed, since the function signature for cblas_sgemm should be similar.

Penguinehis commented 1 year ago

Hi you have the lines needed to compile with MKL?, i'm testing with my I5 11400F (server), to view if i can get it faster than my R5 3600 server

Jacoby1218 commented 1 year ago

if you mean Intel MKL, it's open source now, under Apache license (now called oneMKL), https://github.com/oneapi-src/oneMKL And I think this might be a big benefit to Intel ARC GPUs, while also being compatible with other systems as well (the library supports both CUDA and ROCm.) Not a dev so no idea if that would actually be useful or not.

LostRuins / koboldcpp

Using BLIS or MKL #322