Open ViralBShah opened 3 years ago
Hi.
Thanks for contacting. I'm not familiar with libblastrampoline, but what I want to tout is that BLIS provides a more flexible API compared to standard BLAS (e.g. generic strides and mixed precision) and I want to make use of it.
At this moment simply substituting the backend seems to be insufficient in that sense.
Right, BLIS provides a more flexible API. We should also be able to provide a way for BLIS to replace the underlying Julia BLAS with only one line of code. I will try this out and report findings - but we first need to do some more work on the LAPACK front.
I've once mimicid MKL.jl and created this toy.
I can directly put the switcher code inside this repo but trying libblastrampoline
out seems more interesting.
Basically, lbt_forward
in current Julia master (1.7-dev), allows you to switch the underlying BLAS for all routines with a new one with MKL or potentially BLIS, without having to rebuild the system image.
The only thing is that both OpenBLAS and MKL provide the full LAPACK, but when we use BLIS, we probably want to compile our own LAPACK from source and provide it in BinaryBuilder.
cc @staticfloat
I see. That would be a little tricky. The little painful thing is that libFLAME doesn't provide full LAPACK77 API.
Yet NumPy supports it so maybe someone or myself would be interested in making a wrapper.
On Mon, Mar 8, 2021 at 2:08 Viral B. Shah notifications@github.com wrote:
Basically, lbt_forward in current Julia master, allows you to switch the underlying BLAS for all routines with a new one with MKL or potentially BLIS.
The only thing is that both OpenBLAS and MKL provide the full LAPACK, but when we use BLIS, we probably want to compile our own LAPACK from source and provide it in BinaryBuilder.
cc @staticfloat https://github.com/staticfloat
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaLinearAlgebra/BLIS.jl/issues/3#issuecomment-792325944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4GUGUDMG52CSZ44X3VHGDTCO6K3ANCNFSM4YXIZHCQ .
--
Xu RuQing 許 ルーキン 東大理学系研究科 物理学専攻藤堂研 所属研究室Eメール: ruqing.xu@phys.s.u-tokyo.ac.jp 東京大学生Eメール: r-xu@g.ecc.u-tokyo.ac.jp
In order to use BLIS in Julia, we will just have LAPACK link to BLIS' BLAS API trough LBT. Then all packages that need BLAS can use BLIS and we can see how it performs.
Separately this package and FLAME.jl in the future can explore further capabilities as you articulate.
Just wanted to drop a +1 here. Getting MKL via a simple using MKL
is awesome. Would be great for BLIS too!
FWIW,
using blis_jll
using LinearAlgebra
BLAS.lbt_forward(blis)
seems to work nicely (up to the fact that the remaining LAPACK doesn't use / link against BLIS as mentioned by @ViralBShah above, see also here). I would really like to have a MKL.jl-like package for BLIS that does this simple BLAS switching via LBT. As I understand it from the comments above, the package here (BLIS.jl) currently has a different goal / approach. Is this correct (@xrq-phys)? Should I therefore create a new package, say, "BLISBLAS.jl"?
Side comment: I realized that for the stacked OpenBLAS + BLIS (example above) the function BLAS.set_num_threads(N)
sets the number of OpenBLAS threads. Is there a way to also set the BLIS threads or, more generally, the threads of a specific BLAS/LAPACK in the LBT stack (cc @staticfloat)? For now I use
blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint
blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid
It looks to me like LBT should already know how to deal with BLIS.
There is a completely generic way in which you can register get/set_numthreads functions for your own BLAS library, but BLIS should already be handled natively.
Thanks for the info, that's good to know. But what if I have multiple BLAS/LAPACK libraries stacked on top of each other? Unless I'm missing something, BLAS.get_num_threads
/BLAS.set_num_threads
doesn't allow me to specify the library. Do we need to extend the API here or is there another way to access the registered get/set_num_threads functions?
UPDATE: According to the doc strings for lbt_get/set_num_threads
I should get/set the num threads of all libraries at the same time. But that doesn't seem to be the case?
julia> using LinearAlgebra
julia> BLAS.get_num_threads()
8
julia> using blis_jll
julia> BLAS.lbt_forward(blis; clear=false)
157
julia> BLAS.get_num_threads()
8
julia> blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint;
julia> blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid;
julia> blis_get_num_threads()
-1
julia> blis_set_num_threads(2)
julia> blis_get_num_threads()
2
julia> BLAS.get_num_threads()
8
julia> BLAS.set_num_threads(3)
julia> blis_get_num_threads()
2
@carstenbauer I think LBT's failure to set # of threads is due to this line. libblastrampoline 64_
suffix to all library subroutines not just BLAS ones, while BLIS is built only with the latter.
Sorry not really.
BLIS DOES has 64_
suffix, but is in the form of bli_thread_set_num_threads_64_
instead of bli_thread_set_num_threads64_
.
I would suppose in this case we shall amend libblastrampoline
since BLIS in 32-bit case also yields bli_thread_set_num_threads_
.
You can teach LBT about your thread function name with the following Julia code:
julia> using Libdl, blis_jll, libblastrampoline_jll
getter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_get_num_threads_64_")
setter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_set_num_threads_64_")
@ccall libblastrampoline.lbt_register_thread_interface(getter::Ptr{Cvoid}, setter::Ptr{Cvoid})::Cvoid
Note that the 32-bit version of BLIS calls its thread setter function bli_thread_set_num_threads
; no trailing underscore. I think there may be a small naming incongruity here.
EDIT: Whoops, I mis-read my own API, this code chunk is wrong.
Sorry I made a mistake.
In BLIS only the setter has F77 interface:
bli_thread_set_num_threads_64_
for 64-bit.bli_thread_set_num_threads_
for 32-bit.while bli_thread_set_num_threads
is presented as C interface. So there's no incongruity here.
The problem is that bli_thread_get_num_threads
doesn't have an F77-style counterpart. i.e. only accessible via C-style calling.
Another issue is that: ~While Julia deploys 64-bit BLAS by default, thread-num setter always passes in 32-bit integers. On the contrary, bli_thread_set_num_threads_
is LP64/ILP64 aware. I fear that the higher 32-bit lbt_set_num_threads()
passes in would break the lib down.~ The thread-setting routine used by lbt_set_num_threads
is void (int)
while bli_thread_set_num_threads_
is an F77 interface void (int *)
, while the C interface bli_thread_set_num_threads
takes 64-bit integers instead of 32-bit ones.
Btw line#14 and line#21 seem to have reversed setter and getter.
Perhaps, at least the generic registration method should support thread-num setter with and without the 64_
extension, while preferring the one with an extension.
In BLIS only the setter has F77 interface:
bli_thread_set_num_threads_64_
for 64-bit.bli_thread_set_num_threads_
for 32-bit.while
bli_thread_set_num_threads
is presented as C interface. So there's no incongruity here.
I'm a little confused here; is bli_thread_set_num_threads
supposed to have a trailing underscore or not? Here's what I see from the blis_jll
that I can download right now:
julia> using blis_jll
run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep bli_thread_set_num_threads"`)
0000000000a95520 T bli_thread_set_num_threads
0000000000a703e0 T bli_thread_set_num_threads_64_
So what I see here is that one symbol has no trailing underscore, whereas another does have the trailing underscore. I call this a trailing underscore because the ILP64 symbol suffix that the BLIS library uses (as detected by LBT) is 64_
. You can see this with the following:
julia> using LinearAlgebra, blis_jll
BLAS.lbt_forward(blis_jll.blis_path; verbose=true)
Generating forwards to /home/sabae/.julia/artifacts/b548e034d149feec83ed78f22ab942fea1ac3d12/lib/libblis.so
-> Autodetected symbol suffix "64_"
-> Autodetected interface ILP64 (64-bit)
-> Autodetected gfortran calling convention
Processed 4945 symbols; forwarded 157 symbols with 64-bit interface and mangling to a suffix of "64_"
157
This symbol suffix is detected by probing for a few F77 names with a few suffixes, and if we look at the names for those symbols that are exported from BLIS:
julia> using blis_jll
run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep isamax"`);
0000000000a61340 T isamax_64_
We see that the canonical name isamax_
has 64_
suffixed to it. Now, for consistency's sake (and to allow for loading of libraries that export BOTH ILP64 and LP64 interfaces in a single .so
!) LBT expects all exported names to follow a consistent naming rule, which is that the "canonical" names (whether C or FORTRAN) are suffixed reliably. This means that, for instance, if your LP64 symbol is called bli_thread_set_num_threads
, then the ILP64 symbol is named bli_thread_set_num_threads64_
. Otherwise, LBT has no hope of automatically finding all the different symbols. This is what I mean when I say that there is a symbol naming inconsistency.
The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_numthreads is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.
Are you using a different version of libblis
than I am? I do not have both bli_thread_set_num_threads
and bli_thread_set_num_threads_
in my version. I'm using v0.9.0+0
of the JLL. In any case, if there were a C interface that takes 64-bit integers that's fine, as C passes arguments through registers, so when we pass a 32-bit integer it gets zero-extended. The FORTRAN interface would indeed be a problem though.
Btw line#14 and line#21 seem to have reversed setter and getter.
Good catch! Swapped in https://github.com/JuliaLinearAlgebra/libblastrampoline/commit/145bb64256c441d11b0a742e38f9ef3f08921e8e
Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.
The generic registration method doesn't pay any attention to names; it relies on you to do the dlsym()
manually, then just pass in raw function pointer addresses. So you can do what I mentioned in the code snippet in my previous message and use that directly (with the C interface version of the symbols) and things should "just work".
@staticfloat To your question, current configuration for BLIS builds bli_thread_set_num_threads_
for 32-bit machines and bli_thread_set_num_threads_64_
for 64-bit machines, while bli_thread_set_num_threads
(the one without an underscore) is built always as a BLIS-defined C interface.
Anyway, since libblastrampoline does not pass-in pointers, I'd stick to bli_thread_set_num_threads
without an underscore and manually create a bli_thread_set_num_threads64_
counterpart.
The issue observed above (https://github.com/JuliaLinearAlgebra/BLIS.jl/issues/3#issuecomment-1106619678) should be fixed with the latest update to the Yggdrasil recipe (https://github.com/JuliaPackaging/Yggdrasil/pull/7448). @carstenbauer As verification, it seems to work now in tandem with the direct calls wrapped in BLISBLAS.jl:
julia> import BLISBLAS
[ Info: Precompiling BLISBLAS [6f275bd8-fec0-4d39-945b-7e95a765fa1e]
julia> using LinearAlgebra
julia> BLAS.get_num_threads()
6
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libopenblas64_.0.3.21.dylib
└ [ILP64] libblis.4.0.0.dylib
julia> BLAS.set_num_threads(42)
julia> BLISBLAS.get_num_threads()
42
It would be great to use the same mechanism that MKL.jl uses now, and leverage libblastrampoline.
https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/src/MKL.jl