JuliaLinearAlgebra / BLIS.jl

This repo plans to provide a low-level Julia wrapper for BLIS typed interface.
BSD 3-Clause "New" or "Revised" License
26 stars 4 forks source link

Update to use libblastrampoline #3

Open ViralBShah opened 3 years ago

ViralBShah commented 3 years ago

It would be great to use the same mechanism that MKL.jl uses now, and leverage libblastrampoline.

https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/src/MKL.jl

xrq-phys commented 3 years ago

Hi.

Thanks for contacting. I'm not familiar with libblastrampoline, but what I want to tout is that BLIS provides a more flexible API compared to standard BLAS (e.g. generic strides and mixed precision) and I want to make use of it.

At this moment simply substituting the backend seems to be insufficient in that sense.

ViralBShah commented 3 years ago

Right, BLIS provides a more flexible API. We should also be able to provide a way for BLIS to replace the underlying Julia BLAS with only one line of code. I will try this out and report findings - but we first need to do some more work on the LAPACK front.

xrq-phys commented 3 years ago

I've once mimicid MKL.jl and created this toy.

I can directly put the switcher code inside this repo but trying libblastrampoline out seems more interesting.

ViralBShah commented 3 years ago

Basically, lbt_forward in current Julia master (1.7-dev), allows you to switch the underlying BLAS for all routines with a new one with MKL or potentially BLIS, without having to rebuild the system image.

The only thing is that both OpenBLAS and MKL provide the full LAPACK, but when we use BLIS, we probably want to compile our own LAPACK from source and provide it in BinaryBuilder.

cc @staticfloat

xrq-phys commented 3 years ago

I see. That would be a little tricky. The little painful thing is that libFLAME doesn't provide full LAPACK77 API.

Yet NumPy supports it so maybe someone or myself would be interested in making a wrapper.

On Mon, Mar 8, 2021 at 2:08 Viral B. Shah notifications@github.com wrote:

Basically, lbt_forward in current Julia master, allows you to switch the underlying BLAS for all routines with a new one with MKL or potentially BLIS.

The only thing is that both OpenBLAS and MKL provide the full LAPACK, but when we use BLIS, we probably want to compile our own LAPACK from source and provide it in BinaryBuilder.

cc @staticfloat https://github.com/staticfloat

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaLinearAlgebra/BLIS.jl/issues/3#issuecomment-792325944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4GUGUDMG52CSZ44X3VHGDTCO6K3ANCNFSM4YXIZHCQ .

--

Xu RuQing ルーキン 東大理学系研究科 物理学専攻藤堂研 所属研究室Eメール: ruqing.xu@phys.s.u-tokyo.ac.jp 東京大学生Eメール: r-xu@g.ecc.u-tokyo.ac.jp

ViralBShah commented 3 years ago

In order to use BLIS in Julia, we will just have LAPACK link to BLIS' BLAS API trough LBT. Then all packages that need BLAS can use BLIS and we can see how it performs.

Separately this package and FLAME.jl in the future can explore further capabilities as you articulate.

ViralBShah commented 3 years ago

https://github.com/JuliaPackaging/Yggdrasil/issues/2657

carstenbauer commented 2 years ago

Just wanted to drop a +1 here. Getting MKL via a simple using MKL is awesome. Would be great for BLIS too!

carstenbauer commented 2 years ago

FWIW,

using blis_jll
using LinearAlgebra
BLAS.lbt_forward(blis)

seems to work nicely (up to the fact that the remaining LAPACK doesn't use / link against BLIS as mentioned by @ViralBShah above, see also here). I would really like to have a MKL.jl-like package for BLIS that does this simple BLAS switching via LBT. As I understand it from the comments above, the package here (BLIS.jl) currently has a different goal / approach. Is this correct (@xrq-phys)? Should I therefore create a new package, say, "BLISBLAS.jl"?

Side comment: I realized that for the stacked OpenBLAS + BLIS (example above) the function BLAS.set_num_threads(N) sets the number of OpenBLAS threads. Is there a way to also set the BLIS threads or, more generally, the threads of a specific BLAS/LAPACK in the LBT stack (cc @staticfloat)? For now I use

blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint
blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid
staticfloat commented 2 years ago

It looks to me like LBT should already know how to deal with BLIS.

There is a completely generic way in which you can register get/set_numthreads functions for your own BLAS library, but BLIS should already be handled natively.

carstenbauer commented 2 years ago

Thanks for the info, that's good to know. But what if I have multiple BLAS/LAPACK libraries stacked on top of each other? Unless I'm missing something, BLAS.get_num_threads/BLAS.set_num_threads doesn't allow me to specify the library. Do we need to extend the API here or is there another way to access the registered get/set_num_threads functions?

UPDATE: According to the doc strings for lbt_get/set_num_threads I should get/set the num threads of all libraries at the same time. But that doesn't seem to be the case?

julia> using LinearAlgebra

julia> BLAS.get_num_threads()
8

julia> using blis_jll

julia> BLAS.lbt_forward(blis; clear=false)
157

julia> BLAS.get_num_threads()
8

julia> blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint;

julia> blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid;

julia> blis_get_num_threads()
-1

julia> blis_set_num_threads(2)

julia> blis_get_num_threads()
2

julia> BLAS.get_num_threads()
8

julia> BLAS.set_num_threads(3)

julia> blis_get_num_threads()
2
carstenbauer commented 2 years ago

FYI: https://github.com/carstenbauer/BLISBLAS.jl

xrq-phys commented 2 years ago

@carstenbauer I think LBT's failure to set # of threads is due to this line. libblastrampoline 64_ suffix to all library subroutines not just BLAS ones, while BLIS is built only with the latter.

xrq-phys commented 2 years ago

Sorry not really.

BLIS DOES has 64_ suffix, but is in the form of bli_thread_set_num_threads_64_ instead of bli_thread_set_num_threads64_.

I would suppose in this case we shall amend libblastrampoline since BLIS in 32-bit case also yields bli_thread_set_num_threads_.

staticfloat commented 2 years ago

You can teach LBT about your thread function name with the following Julia code:

julia> using Libdl, blis_jll, libblastrampoline_jll
       getter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_get_num_threads_64_")
       setter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_set_num_threads_64_")
       @ccall libblastrampoline.lbt_register_thread_interface(getter::Ptr{Cvoid}, setter::Ptr{Cvoid})::Cvoid

Note that the 32-bit version of BLIS calls its thread setter function bli_thread_set_num_threads; no trailing underscore. I think there may be a small naming incongruity here.

EDIT: Whoops, I mis-read my own API, this code chunk is wrong.

xrq-phys commented 2 years ago

Sorry I made a mistake.

In BLIS only the setter has F77 interface:

while bli_thread_set_num_threads is presented as C interface. So there's no incongruity here.

The problem is that bli_thread_get_num_threads doesn't have an F77-style counterpart. i.e. only accessible via C-style calling.

Another issue is that: ~While Julia deploys 64-bit BLAS by default, thread-num setter always passes in 32-bit integers. On the contrary, bli_thread_set_num_threads_ is LP64/ILP64 aware. I fear that the higher 32-bit lbt_set_num_threads() passes in would break the lib down.~ The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_num_threads_ is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.

Btw line#14 and line#21 seem to have reversed setter and getter.

xrq-phys commented 2 years ago

Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.

staticfloat commented 2 years ago

In BLIS only the setter has F77 interface:

  • bli_thread_set_num_threads_64_ for 64-bit.
  • bli_thread_set_num_threads_ for 32-bit.

while bli_thread_set_num_threads is presented as C interface. So there's no incongruity here.

I'm a little confused here; is bli_thread_set_num_threads supposed to have a trailing underscore or not? Here's what I see from the blis_jll that I can download right now:

julia> using blis_jll
       run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep bli_thread_set_num_threads"`)
0000000000a95520 T bli_thread_set_num_threads
0000000000a703e0 T bli_thread_set_num_threads_64_

So what I see here is that one symbol has no trailing underscore, whereas another does have the trailing underscore. I call this a trailing underscore because the ILP64 symbol suffix that the BLIS library uses (as detected by LBT) is 64_. You can see this with the following:

julia> using LinearAlgebra, blis_jll
       BLAS.lbt_forward(blis_jll.blis_path; verbose=true)
Generating forwards to /home/sabae/.julia/artifacts/b548e034d149feec83ed78f22ab942fea1ac3d12/lib/libblis.so
 -> Autodetected symbol suffix "64_"
 -> Autodetected interface ILP64 (64-bit)
 -> Autodetected gfortran calling convention
Processed 4945 symbols; forwarded 157 symbols with 64-bit interface and mangling to a suffix of "64_"
157

This symbol suffix is detected by probing for a few F77 names with a few suffixes, and if we look at the names for those symbols that are exported from BLIS:

julia> using blis_jll
       run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep isamax"`);
0000000000a61340 T isamax_64_

We see that the canonical name isamax_ has 64_ suffixed to it. Now, for consistency's sake (and to allow for loading of libraries that export BOTH ILP64 and LP64 interfaces in a single .so!) LBT expects all exported names to follow a consistent naming rule, which is that the "canonical" names (whether C or FORTRAN) are suffixed reliably. This means that, for instance, if your LP64 symbol is called bli_thread_set_num_threads, then the ILP64 symbol is named bli_thread_set_num_threads64_. Otherwise, LBT has no hope of automatically finding all the different symbols. This is what I mean when I say that there is a symbol naming inconsistency.

The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_numthreads is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.

Are you using a different version of libblis than I am? I do not have both bli_thread_set_num_threads and bli_thread_set_num_threads_ in my version. I'm using v0.9.0+0 of the JLL. In any case, if there were a C interface that takes 64-bit integers that's fine, as C passes arguments through registers, so when we pass a 32-bit integer it gets zero-extended. The FORTRAN interface would indeed be a problem though.

Btw line#14 and line#21 seem to have reversed setter and getter.

Good catch! Swapped in https://github.com/JuliaLinearAlgebra/libblastrampoline/commit/145bb64256c441d11b0a742e38f9ef3f08921e8e

Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.

The generic registration method doesn't pay any attention to names; it relies on you to do the dlsym() manually, then just pass in raw function pointer addresses. So you can do what I mentioned in the code snippet in my previous message and use that directly (with the C interface version of the symbols) and things should "just work".

xrq-phys commented 2 years ago

This line seems only working on strings?

xrq-phys commented 2 years ago

@staticfloat To your question, current configuration for BLIS builds bli_thread_set_num_threads_ for 32-bit machines and bli_thread_set_num_threads_64_ for 64-bit machines, while bli_thread_set_num_threads (the one without an underscore) is built always as a BLIS-defined C interface.

Anyway, since libblastrampoline does not pass-in pointers, I'd stick to bli_thread_set_num_threads without an underscore and manually create a bli_thread_set_num_threads64_ counterpart.

jd-foster commented 1 year ago

The issue observed above (https://github.com/JuliaLinearAlgebra/BLIS.jl/issues/3#issuecomment-1106619678) should be fixed with the latest update to the Yggdrasil recipe (https://github.com/JuliaPackaging/Yggdrasil/pull/7448). @carstenbauer As verification, it seems to work now in tandem with the direct calls wrapped in BLISBLAS.jl:

julia> import BLISBLAS
[ Info: Precompiling BLISBLAS [6f275bd8-fec0-4d39-945b-7e95a765fa1e]

julia> using LinearAlgebra

julia> BLAS.get_num_threads()
6

julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libopenblas64_.0.3.21.dylib
└ [ILP64] libblis.4.0.0.dylib

julia> BLAS.set_num_threads(42)

julia> BLISBLAS.get_num_threads()
42