Open AndLLA opened 3 months ago
Is it actually faster?
I did several tests and the BLAS stage seems to scale with NumCores/2, for example if a BLAS stage with "openblas+serial" takes 3 mins, the "openblas+parallel" takes 1 min, using 6 cores/threads and a batch size of 512, running everything on the cpu.
On the other side, the speed increase for the inference stage is less noticeable (about 5%-10% faster).
Compared to "koboldcpp_default", the BLAS stage using "openblas+parallel" is 10%-20% faster.
p.s. the openblas_set_num_threads is completely ignored in the "serial" openblas, it always uses one thread.
Looked at conda since we'd be implementing it in the CI based on conda. Libopenblas's latest version ships openblasp and apparently openblas regular .so is symlinked to this. Are you sure Fedora isn't doing the same thing? Because our prebuilt binaries probably already use the parralel version.
Does mean we can probably drop-in replace the windows .dll.
Hallo, part of the patch introduces a log line which reports the flavour of openblas.
for example if the runtime uses the parallel flavour, the output will be something like this:
ggml_backend_blas_init: openblas_get_parallel 1 ggml_backend_blas_init: openblas_get_config OpenBLAS 0.3.26 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=128
instead, if the runtime uses the non-parallel flavour, the output will be something like this:
ggml_backend_blas_init: openblas_get_parallel 0 ggml_backend_blas_init: openblas_get_config OpenBLAS 0.3.26 DYNAMIC_ARCH NO_AFFINITY Haswell SINGLE_THREADED
On fc40 there isn't a symlink pointing to the parallel openblas by default. here what I see on the file system (re-installed the latest rpm to make sure):
-rwxr-xr-x. 1 root root 40779408 Feb 9 2024 libopenblasp-r0.3.26.so
lrwxrwxrwx. 1 root root 23 Feb 9 2024 libopenblasp.so -> libopenblasp-r0.3.26.so
lrwxrwxrwx. 1 root root 23 Feb 9 2024 libopenblasp.so.0 -> libopenblasp-r0.3.26.so
-rwxr-xr-x. 1 root root 39286328 Feb 9 2024 libopenblas-r0.3.26.so
lrwxrwxrwx. 1 root root 22 Feb 9 2024 libopenblas.so -> libopenblas-r0.3.26.so
lrwxrwxrwx. 1 root root 22 Feb 9 2024 libopenblas.so.0 -> libopenblas-r0.3.26.so
I don't know on windows, but on linux they are "drop-in" replaceable :)
Thanks
on recent linux distros (e.g. fedora 40), the paralell version of openblas has a "p" suffix "-lopenblasp", therefore linking against "-lopenblas" always uses the serial version.
in addition, we print out at runtime the exact flavour of openblas used: