JuliaLinearAlgebra / MKL.jl

Intel MKL linear algebra backend for Julia
Other
208 stars 32 forks source link

How to call MKL setter functions? #103

Closed carstenbauer closed 2 years ago

carstenbauer commented 2 years ago

I want to call mkl_get_dynamic and mkl_set_dynamic. I tried wrapping them as follows.

mkl_get_dynamic() = @ccall MKL.libmkl_rt.mkl_get_dynamic()::Cint
mkl_set_dynamic(flag::Integer) = @ccall MKL.libmkl_rt.mkl_set_dynamic(flag::Cint)::Cvoid

While mkl_get_dynamic() works I get a segfault for mkl_set_dynamic(0):

julia> using MKL                                                                                                                                                                   

julia> mkl_set_dynamic(flag::Integer) = @ccall MKL.libmkl_rt.mkl_set_dynamic(flag::Cint)::Cvoid                                                                                    
mkl_set_dynamic (generic function with 1 method)                                                                                                                                   

julia> mkl_get_dynamic() = @ccall MKL.libmkl_rt.mkl_get_dynamic()::Cint                                                                                                            
mkl_get_dynamic (generic function with 1 method)                                                                                                                                   

julia> mkl_get_dynamic()                                                                                                                                                           
1                                                                                                                                                                                  

julia> mkl_set_dynamic(0)

signal (11): Segmentation fault
in expression starting at REPL[11]:1
MKL_SET_DYNAMIC at /scratch/pc2-mitarbeiter/bauerc/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_ilp64.so.1 (unknown line)
mkl_set_dynamic at ./REPL[8]:1
unknown function (ip: 0x15552804ed45)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229
#run_repl#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#930 at ./client.jl:394
jfptr_YY.930_40362.clone_1 at /cm/shared/apps/pc2/EB-SW/software/Julia/1.7.1-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_40531.clone_1 at /cm/shared/apps/pc2/EB-SW/software/Julia/1.7.1-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x400808)
Allocations: 3190888 (Pool: 3189391; Big: 1497); GC: 4
Segmentation fault (core dumped)

I then noted that one also gets a segfault when trying to wrap mkl_set_num_threads similar to mkl_set_dynamic above and realized that for BLAS.set_num_threads we go through LBT, i.e. lbt_set_num_threads.

My question therefore is how to call a generic MKL "setter" function (which doesn't have a lbt_* pendant). Any help would be much appreciated.

carstenbauer commented 2 years ago

I'm not sure if this is the right repo to ask this question / file this issue in. (@staticfloat, is this perhaps a LBT "issue"?)

ViralBShah commented 2 years ago

You should be able to call MKL functions directly - there is no need to go through LBT. I don't know what the issue here is though.

carstenbauer commented 2 years ago

You should be able to call MKL functions directly - there is no need to go through LBT.

Alright, that's what I thought. Thanks for confirming.

Would be great to figure out why this isn't working then. I will investigate this a bit more (different machines / OSs etc.) later today.

carstenbauer commented 2 years ago

I also get segfaults on another Linux cluster and my Macbook with macOS. For the latter, the segfault reads

signal (11): Segmentation fault: 11
in expression starting at REPL[7]:1
MKL_SET_DYNAMIC at /Users/crstnbr/.julia/artifacts/be4d5c9f8ccb92916ceb22a6b15085b57f81d05d/lib/libmkl_intel_ilp64.1.dylib (unknown line)
Allocations: 809455 (Pool: 809051; Big: 404); GC: 1
ViralBShah commented 2 years ago

Maybe @giordano might know some of the bad things MKL does.

giordano commented 2 years ago

Just to confirm this doesn't have anything to do with LBT, I get the same by using MKL_jll only:

julia> using MKL_jll

julia> mkl_get_dynamic() = @ccall MKL_jll.libmkl_rt.mkl_get_dynamic()::Cint
mkl_get_dynamic (generic function with 1 method)

julia> mkl_get_dynamic()
1

julia> mkl_set_dynamic(flag::Integer) = @ccall MKL_jll.libmkl_rt.mkl_set_dynamic(flag::Cint)::Cvoid
mkl_set_dynamic (generic function with 1 method)

julia> mkl_set_dynamic(0)

signal (11): Segmentation fault
in expression starting at REPL[10]:1
giordano commented 2 years ago

I'd suggest trying to produce a minimal C code which reproduces (or maybe not) the segfault. Unfortunately the Anaconda distribution of MKL doesn't seem to ship the header files

ViralBShah commented 2 years ago

Try with a julia debug build?

ViralBShah commented 2 years ago

What does this function do?

giordano commented 2 years ago

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/threading-control/mkl-set-dynamic.html

This function indicates whether Intel® oneAPI Math Kernel Library can dynamically change the number of OpenMP threads or should avoid doing this. The setting applies to all Intel® oneAPI Math Kernel Library functions on all execution threads. This function takes precedence over theMKL_DYNAMIC environment variable.

carstenbauer commented 2 years ago

To elaborate a bit more: I need to set MKL_DYNAMIC=false when I'm pinning Julia threads to specific cores because otherwise MKL will spoil the pinning, see https://discourse.julialang.org/t/julia-thread-affinity-not-persistent-when-calling-mkl-function/74560/3.

Note, however, that the issue here isn't specific to mkl_set_dynamic. Directly trying to call mkl_set_num_threads also segfaults.

giordano commented 2 years ago

I just tried on a cluster where I have MKL 2020 available:

#include <stdio.h>
#include "mkl.h"

int main() {
    printf("Number of MKL threads: %d\n", mkl_get_max_threads());
    printf("MKL dynamc: %d\n", mkl_get_dynamic());
    mkl_set_num_threads(16);
    mkl_set_dynamic(0);
    printf("Number of MKL threads: %d\n", mkl_get_max_threads());
    printf("MKL dynamc: %d\n", mkl_get_dynamic());
    return 0;
}
$ cc -o mkl mkl.c -I"${MKLROOT}/include" -L"${MKLROOT}/lib/intel64" -lmkl_rt
$ LD_LIBRARY_PATH="${MKLROOT}/lib/intel64:${LD_LIBRARY_PATH}" ./mkl
Number of MKL threads: 36
MKL dynamc: 1
Number of MKL threads: 16
MKL dynamc: 0
$ echo $?
0

So the C code seems to be working as expected.

I briefly tried running Julia under gdb, breaking mkl_set_dynamic, the segfault seems to happen inside some MKL libs (as also shown by the error above), but I didn't go much further than that. I have no clue of what's wrong with MKL, I can only tell it's a source of problems :smile:

ViralBShah commented 2 years ago

I wonder if Julia needs to be built with COPY_STACKS.

giordano commented 2 years ago

You mean setting the environment variable JULIA_COPY_STACKS=1? That doesn't seem to change much for me

ViralBShah commented 2 years ago

I thought it was a compile time thing.

carstenbauer commented 2 years ago

I briefly tried running Julia under gdb, breaking mkl_set_dynamic

Did the same, here is the backtrace (although I don't think its too helpful).

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00001554df7d5020 in mkl_set_dynamic_ ()
   from /scratch/pc2-mitarbeiter/bauerc/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_ilp64.so.1
(gdb) bt
#0  0x00001554df7d5020 in mkl_set_dynamic_ ()
   from /scratch/pc2-mitarbeiter/bauerc/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_ilp64.so.1
#1  0x00001554f14c0e31 in julia_mkl_set_dynamic_282 (flag=0) at REPL[1]:1
#2  0x00001554f14c0e96 in jfptr_mkl_set_dynamic_283 ()
#3  0x000015555425b513 in _jl_invoke (F=0x155541fd4f10, args=0x7ffffffef2d8, nargs=1, mfunc=0x15553e050920, world=31335) at julia_internal.h:2247
#4  0x000015555425bed6 in jl_apply_generic (F=0x155541fd4f10, args=0x7ffffffef2d8, nargs=1) at julia_internal.h:2429
#5  0x000015555427941f in jl_apply (args=0x7ffffffef2d0, nargs=2) at /scratch/pc2-mitarbeiter/bauerc/building/julia/julia-source/src/interpreter.c:1788
#6  0x00001555542798bd in do_call (args=0x15553f0c8bb8, nargs=2, s=0x7ffffffef6f0) at julia_internal.h:126
[...]
carstenbauer commented 2 years ago

I just tried on a cluster where I have MKL 2020 available:

FWIW, I just compiled an ran the same C code on our cluster with MKL 2021 and it also runs just fine.

I guess what I could try to do is use a Overrides.toml to reroute MKL_jll to the system MKL and see if the issue goes away.

carstenbauer commented 2 years ago

I guess what I could try to do is use a Overrides.toml to reroute MKL_jll to the system MKL and see if the issue goes away.

Hm, I still see the issue using this approach.

[856f044c-d86e-5d09-b602-aeab76dc8ba7]
MKL = "/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/software/julia/custom-artifacts/MKL_jll"

[1d5cc7b8-4909-519e-a0f8-d0f5ad9712d0]
IntelOpenMP = "/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/software/julia/custom-artifacts/IntelOpenMP_jll"
➜  bauerc@ln-0002 custom-artifacts  tree
.
├── IntelOpenMP_jll
│   ├── lib
│   │   └── libiomp5.so -> /cm/shared/apps/pc2/EB-SW/software/iccifort/2020.4.304/lib/intel64/libiomp5.so
│   └── REQUIRES.md
├── MKL_jll
│   ├── lib
│   │   ├── libmkl_core.so -> /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_core.so
│   │   ├── libmkl_core.so.1 -> libmkl_core.so
│   │   ├── libmkl_rt.so -> /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_rt.so
│   │   └── libmkl_rt.so.1 -> libmkl_rt.so
│   └── REQUIRES.md
└── README.md
julia> using MKL_jll

julia> libmkl_rt
"/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/software/julia/custom-artifacts/MKL_jll/lib/libmkl_rt.so"

julia> mkl_get_dynamic() = @ccall libmkl_rt.mkl_get_dynamic()::Cint
mkl_get_dynamic (generic function with 1 method)

julia> mkl_set_dynamic(flag::Integer) = @ccall libmkl_rt.mkl_set_dynamic(flag::Cint)::Cvoid
mkl_set_dynamic (generic function with 1 method)

julia> mkl_get_dynamic()
1

julia> mkl_set_dynamic(0)

signal (11): Segmentation fault
in expression starting at REPL[8]:1
MKL_SET_DYNAMIC at /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_intel_lp64.so (unknown line)
mkl_set_dynamic at ./REPL[6]:1
unknown function (ip: 0x1554efa6ab25)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
[...]

What does this leave us with...? The same library works when called from C.

➜  bauerc@ln-0002 julia  LD_LIBRARY_PATH="${MKLROOT}/lib/intel64:${LD_LIBRARY_PATH}" ./mkl
Number of MKL threads: 40
MKL dynamc: 1
Number of MKL threads: 16
MKL dynamc: 0

➜  bauerc@ln-0002 julia  ldd mkl
        linux-vdso.so.1 (0x0000155555551000)
        libmkl_rt.so => /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_rt.so (0x0000155554bd0000)
        libc.so.6 => /lib64/libc.so.6 (0x000015555480b000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000155554607000)
        /lib64/ld-linux-x86-64.so.2 (0x0000155555327000)
ViralBShah commented 2 years ago

IMO, the issue has to do with how Julia sets up the process, and how that is incompatible with whatever MKL does.

ViralBShah commented 2 years ago

Perhaps @vtjnash may know what the conflict may be.

carstenbauer commented 2 years ago

IMO, the issue has to do with how Julia sets up the process, and how that is incompatible with whatever MKL does.

In that direction: what I don't understand is why calling mkl_set_num_threads directly segfaults but going through LBT, i.e. calling lbt_set_num_threads, works just fine.

staticfloat commented 2 years ago

MKL occasionally does very confusing things like have a C symbol named like the FORTRAN symbol, and have them point to the same underlying code:

$ nm /home/sabae/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_rt.so | grep set_dynamic
0000000000183f70 T mkl_set_dynamic
0000000000183f70 T mkl_set_dynamic_

Which then uses the FORTRAN ABI:

    0x7fff8f6c1670 <mkl_set_dynamic_>       mov    (%rdi),%edi
    0x7fff8f6c1672 <mkl_set_dynamic_+2>     jmpq   0x7fff8f6ae170 <mkl_serv_set_dynamic@plt>

For those that have not been sufficiently scarred by life to understand what the assembly means, this extracts the value pointed to by $rdi (the first argument that gets passed in on x86_64) out into $edi. This means that it's trying to dereference the first argument as if it were a pointer. It then immediately jumps to an internal symbol that does the actual work.

This means that what we're hitting here is a shim that is meant to receive the FORTRAN ABI, which is to pass integers by reference, not by value.

Solution 1: use the FORTRAN ABI

julia> using MKL_jll, Libdl
       mkl_get_dynamic() = @ccall libmkl_rt.mkl_get_dynamic()::Cint
       addr = dlsym(dlopen(libmkl_rt), "mkl_set_dynamic")
       mkl_set_dynamic(flag::Integer) = ccall(addr, Cvoid, (Ptr{Cint},), Ref(Cint(flag)))
       mkl_get_dynamic()
1

julia> mkl_set_dynamic(0)

julia> mkl_get_dynamic()
0

Solution 2: use MKL's native C API:

julia> using MKL_jll
       mkl_get_dynamic() = @ccall libmkl_rt.mkl_get_dynamic()::Cint
       mkl_set_dynamic(flag::Integer) = @ccall libmkl_rt.MKL_Set_Dynamic(flag::Cint)::Cvoid
       mkl_get_dynamic()
1

julia> mkl_set_dynamic(0)

julia> mkl_get_dynamic()
0

That's right; MKL_Set_Dynamic is a completely different symbol from mkl_get_dynamic:

$ nm /home/sabae/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_rt.so | grep -i set_dynamic
0000000000183f70 T mkl_set_dynamic
0000000000183f70 T mkl_set_dynamic_
0000000000183fc0 T MKL_Set_Dynamic
0000000000183f70 T MKL_SET_DYNAMIC

I don't know where this is documented, it's just something I discovered while writing LBT; that the C-style interfaces are mostly Camel_Snake_Case while the FORTRAN-style interfaces are the lowercase, lowercase with an underscore, and uppercase variants.

This is also why mkl_set_num_threads directly segfaults; because LBT actually invokes MKL_Set_Num_Threads.

giordano commented 2 years ago

If you, like me, were wondering how the heck the C code works at all with the same function name:

$ cc mkl.c -I"${MKLROOT}/include" -E | tail

int main() {
    printf("Number of MKL threads: %d\n", MKL_Get_Max_Threads());
    printf("MKL dynamc: %d\n", MKL_Get_Dynamic());
    MKL_Set_Num_Threads(16);
    MKL_Set_Dynamic(0);
    printf("Number of MKL threads: %d\n", MKL_Get_Max_Threads());
    printf("MKL dynamc: %d\n", MKL_Get_Dynamic());
    return 0;
}

The MKL header file remaps lower_case to Camel_Snake_Case: https://github.com/guidop/Repo/blob/bc4159501b75d90167f6b0f4ef453dfdc679a6a5/3rdparty/IntelSWTools/2017.2.187/windows/mkl/include/mkl_service.h#L107. Thanks Elliot for pointing this out, in hindsight it was easy to see the cheat they do.

carstenbauer commented 2 years ago

Thanks Elliot! I wonder, should we mention this in the docs somewhere? Or is having this issue here enough?

giordano commented 2 years ago

Side note, I don't think we need to use dlsym also with the Fortran ABI?

julia> using MKL_jll

julia> mkl_get_dynamic() = @ccall libmkl_rt.MKL_Get_Dynamic()::Cint
mkl_get_dynamic (generic function with 1 method)

julia> mkl_set_dynamic(flag::Integer) = @ccall libmkl_rt.mkl_set_dynamic((Ref(Cint(flag)))::Ptr{Cint})::Cvoid
mkl_set_dynamic (generic function with 1 method)

julia> mkl_get_dynamic()
1

julia> mkl_set_dynamic(0)

julia> mkl_get_dynamic()
0

julia> mkl_set_dynamic(1)

julia> mkl_get_dynamic()
1

Seems to work fine to just @ccall libname+symbol as usual

staticfloat commented 2 years ago

Side note, I don't think we need to use dlsym also with the Fortran ABI?

This is because I don't know how to use @ccall...... ;)