Closed carstenbauer closed 2 years ago
I'm not sure if this is the right repo to ask this question / file this issue in. (@staticfloat, is this perhaps a LBT "issue"?)
You should be able to call MKL functions directly - there is no need to go through LBT. I don't know what the issue here is though.
You should be able to call MKL functions directly - there is no need to go through LBT.
Alright, that's what I thought. Thanks for confirming.
Would be great to figure out why this isn't working then. I will investigate this a bit more (different machines / OSs etc.) later today.
I also get segfaults on another Linux cluster and my Macbook with macOS. For the latter, the segfault reads
signal (11): Segmentation fault: 11
in expression starting at REPL[7]:1
MKL_SET_DYNAMIC at /Users/crstnbr/.julia/artifacts/be4d5c9f8ccb92916ceb22a6b15085b57f81d05d/lib/libmkl_intel_ilp64.1.dylib (unknown line)
Allocations: 809455 (Pool: 809051; Big: 404); GC: 1
Maybe @giordano might know some of the bad things MKL does.
Just to confirm this doesn't have anything to do with LBT, I get the same by using MKL_jll
only:
julia> using MKL_jll
julia> mkl_get_dynamic() = @ccall MKL_jll.libmkl_rt.mkl_get_dynamic()::Cint
mkl_get_dynamic (generic function with 1 method)
julia> mkl_get_dynamic()
1
julia> mkl_set_dynamic(flag::Integer) = @ccall MKL_jll.libmkl_rt.mkl_set_dynamic(flag::Cint)::Cvoid
mkl_set_dynamic (generic function with 1 method)
julia> mkl_set_dynamic(0)
signal (11): Segmentation fault
in expression starting at REPL[10]:1
I'd suggest trying to produce a minimal C code which reproduces (or maybe not) the segfault. Unfortunately the Anaconda distribution of MKL doesn't seem to ship the header files
Try with a julia debug build?
What does this function do?
This function indicates whether Intel® oneAPI Math Kernel Library can dynamically change the number of OpenMP threads or should avoid doing this. The setting applies to all Intel® oneAPI Math Kernel Library functions on all execution threads. This function takes precedence over theMKL_DYNAMIC environment variable.
To elaborate a bit more: I need to set MKL_DYNAMIC=false
when I'm pinning Julia threads to specific cores because otherwise MKL will spoil the pinning, see https://discourse.julialang.org/t/julia-thread-affinity-not-persistent-when-calling-mkl-function/74560/3.
Note, however, that the issue here isn't specific to mkl_set_dynamic
. Directly trying to call mkl_set_num_threads
also segfaults.
I just tried on a cluster where I have MKL 2020 available:
#include <stdio.h>
#include "mkl.h"
int main() {
printf("Number of MKL threads: %d\n", mkl_get_max_threads());
printf("MKL dynamc: %d\n", mkl_get_dynamic());
mkl_set_num_threads(16);
mkl_set_dynamic(0);
printf("Number of MKL threads: %d\n", mkl_get_max_threads());
printf("MKL dynamc: %d\n", mkl_get_dynamic());
return 0;
}
$ cc -o mkl mkl.c -I"${MKLROOT}/include" -L"${MKLROOT}/lib/intel64" -lmkl_rt
$ LD_LIBRARY_PATH="${MKLROOT}/lib/intel64:${LD_LIBRARY_PATH}" ./mkl
Number of MKL threads: 36
MKL dynamc: 1
Number of MKL threads: 16
MKL dynamc: 0
$ echo $?
0
So the C code seems to be working as expected.
I briefly tried running Julia under gdb, breaking mkl_set_dynamic
, the segfault seems to happen inside some MKL libs (as also shown by the error above), but I didn't go much further than that. I have no clue of what's wrong with MKL, I can only tell it's a source of problems :smile:
I wonder if Julia needs to be built with COPY_STACKS
.
You mean setting the environment variable JULIA_COPY_STACKS=1
? That doesn't seem to change much for me
I thought it was a compile time thing.
I briefly tried running Julia under gdb, breaking mkl_set_dynamic
Did the same, here is the backtrace (although I don't think its too helpful).
Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00001554df7d5020 in mkl_set_dynamic_ ()
from /scratch/pc2-mitarbeiter/bauerc/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_ilp64.so.1
(gdb) bt
#0 0x00001554df7d5020 in mkl_set_dynamic_ ()
from /scratch/pc2-mitarbeiter/bauerc/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_ilp64.so.1
#1 0x00001554f14c0e31 in julia_mkl_set_dynamic_282 (flag=0) at REPL[1]:1
#2 0x00001554f14c0e96 in jfptr_mkl_set_dynamic_283 ()
#3 0x000015555425b513 in _jl_invoke (F=0x155541fd4f10, args=0x7ffffffef2d8, nargs=1, mfunc=0x15553e050920, world=31335) at julia_internal.h:2247
#4 0x000015555425bed6 in jl_apply_generic (F=0x155541fd4f10, args=0x7ffffffef2d8, nargs=1) at julia_internal.h:2429
#5 0x000015555427941f in jl_apply (args=0x7ffffffef2d0, nargs=2) at /scratch/pc2-mitarbeiter/bauerc/building/julia/julia-source/src/interpreter.c:1788
#6 0x00001555542798bd in do_call (args=0x15553f0c8bb8, nargs=2, s=0x7ffffffef6f0) at julia_internal.h:126
[...]
I just tried on a cluster where I have MKL 2020 available:
FWIW, I just compiled an ran the same C code on our cluster with MKL 2021 and it also runs just fine.
I guess what I could try to do is use a Overrides.toml
to reroute MKL_jll to the system MKL and see if the issue goes away.
I guess what I could try to do is use a
Overrides.toml
to reroute MKL_jll to the system MKL and see if the issue goes away.
Hm, I still see the issue using this approach.
[856f044c-d86e-5d09-b602-aeab76dc8ba7]
MKL = "/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/software/julia/custom-artifacts/MKL_jll"
[1d5cc7b8-4909-519e-a0f8-d0f5ad9712d0]
IntelOpenMP = "/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/software/julia/custom-artifacts/IntelOpenMP_jll"
➜ bauerc@ln-0002 custom-artifacts tree
.
├── IntelOpenMP_jll
│ ├── lib
│ │ └── libiomp5.so -> /cm/shared/apps/pc2/EB-SW/software/iccifort/2020.4.304/lib/intel64/libiomp5.so
│ └── REQUIRES.md
├── MKL_jll
│ ├── lib
│ │ ├── libmkl_core.so -> /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_core.so
│ │ ├── libmkl_core.so.1 -> libmkl_core.so
│ │ ├── libmkl_rt.so -> /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_rt.so
│ │ └── libmkl_rt.so.1 -> libmkl_rt.so
│ └── REQUIRES.md
└── README.md
julia> using MKL_jll
julia> libmkl_rt
"/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/software/julia/custom-artifacts/MKL_jll/lib/libmkl_rt.so"
julia> mkl_get_dynamic() = @ccall libmkl_rt.mkl_get_dynamic()::Cint
mkl_get_dynamic (generic function with 1 method)
julia> mkl_set_dynamic(flag::Integer) = @ccall libmkl_rt.mkl_set_dynamic(flag::Cint)::Cvoid
mkl_set_dynamic (generic function with 1 method)
julia> mkl_get_dynamic()
1
julia> mkl_set_dynamic(0)
signal (11): Segmentation fault
in expression starting at REPL[8]:1
MKL_SET_DYNAMIC at /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_intel_lp64.so (unknown line)
mkl_set_dynamic at ./REPL[6]:1
unknown function (ip: 0x1554efa6ab25)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
[...]
What does this leave us with...? The same library works when called from C.
➜ bauerc@ln-0002 julia LD_LIBRARY_PATH="${MKLROOT}/lib/intel64:${LD_LIBRARY_PATH}" ./mkl
Number of MKL threads: 40
MKL dynamc: 1
Number of MKL threads: 16
MKL dynamc: 0
➜ bauerc@ln-0002 julia ldd mkl
linux-vdso.so.1 (0x0000155555551000)
libmkl_rt.so => /cm/shared/apps/pc2/EB-SW/software/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/libmkl_rt.so (0x0000155554bd0000)
libc.so.6 => /lib64/libc.so.6 (0x000015555480b000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000155554607000)
/lib64/ld-linux-x86-64.so.2 (0x0000155555327000)
IMO, the issue has to do with how Julia sets up the process, and how that is incompatible with whatever MKL does.
Perhaps @vtjnash may know what the conflict may be.
IMO, the issue has to do with how Julia sets up the process, and how that is incompatible with whatever MKL does.
In that direction: what I don't understand is why calling mkl_set_num_threads
directly segfaults but going through LBT, i.e. calling lbt_set_num_threads
, works just fine.
MKL occasionally does very confusing things like have a C symbol named like the FORTRAN symbol, and have them point to the same underlying code:
$ nm /home/sabae/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_rt.so | grep set_dynamic
0000000000183f70 T mkl_set_dynamic
0000000000183f70 T mkl_set_dynamic_
Which then uses the FORTRAN ABI:
0x7fff8f6c1670 <mkl_set_dynamic_> mov (%rdi),%edi
0x7fff8f6c1672 <mkl_set_dynamic_+2> jmpq 0x7fff8f6ae170 <mkl_serv_set_dynamic@plt>
For those that have not been sufficiently scarred by life to understand what the assembly means, this extracts the value pointed to by $rdi
(the first argument that gets passed in on x86_64) out into $edi
. This means that it's trying to dereference the first argument as if it were a pointer. It then immediately jumps to an internal symbol that does the actual work.
This means that what we're hitting here is a shim that is meant to receive the FORTRAN ABI, which is to pass integers by reference, not by value.
julia> using MKL_jll, Libdl
mkl_get_dynamic() = @ccall libmkl_rt.mkl_get_dynamic()::Cint
addr = dlsym(dlopen(libmkl_rt), "mkl_set_dynamic")
mkl_set_dynamic(flag::Integer) = ccall(addr, Cvoid, (Ptr{Cint},), Ref(Cint(flag)))
mkl_get_dynamic()
1
julia> mkl_set_dynamic(0)
julia> mkl_get_dynamic()
0
julia> using MKL_jll
mkl_get_dynamic() = @ccall libmkl_rt.mkl_get_dynamic()::Cint
mkl_set_dynamic(flag::Integer) = @ccall libmkl_rt.MKL_Set_Dynamic(flag::Cint)::Cvoid
mkl_get_dynamic()
1
julia> mkl_set_dynamic(0)
julia> mkl_get_dynamic()
0
That's right; MKL_Set_Dynamic
is a completely different symbol from mkl_get_dynamic
:
$ nm /home/sabae/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_rt.so | grep -i set_dynamic
0000000000183f70 T mkl_set_dynamic
0000000000183f70 T mkl_set_dynamic_
0000000000183fc0 T MKL_Set_Dynamic
0000000000183f70 T MKL_SET_DYNAMIC
I don't know where this is documented, it's just something I discovered while writing LBT; that the C-style interfaces are mostly Camel_Snake_Case while the FORTRAN-style interfaces are the lowercase, lowercase with an underscore, and uppercase variants.
This is also why mkl_set_num_threads
directly segfaults; because LBT actually invokes MKL_Set_Num_Threads
.
If you, like me, were wondering how the heck the C code works at all with the same function name:
$ cc mkl.c -I"${MKLROOT}/include" -E | tail
int main() {
printf("Number of MKL threads: %d\n", MKL_Get_Max_Threads());
printf("MKL dynamc: %d\n", MKL_Get_Dynamic());
MKL_Set_Num_Threads(16);
MKL_Set_Dynamic(0);
printf("Number of MKL threads: %d\n", MKL_Get_Max_Threads());
printf("MKL dynamc: %d\n", MKL_Get_Dynamic());
return 0;
}
The MKL header file remaps lower_case
to Camel_Snake_Case
: https://github.com/guidop/Repo/blob/bc4159501b75d90167f6b0f4ef453dfdc679a6a5/3rdparty/IntelSWTools/2017.2.187/windows/mkl/include/mkl_service.h#L107. Thanks Elliot for pointing this out, in hindsight it was easy to see the cheat they do.
Thanks Elliot! I wonder, should we mention this in the docs somewhere? Or is having this issue here enough?
Side note, I don't think we need to use dlsym
also with the Fortran ABI?
julia> using MKL_jll
julia> mkl_get_dynamic() = @ccall libmkl_rt.MKL_Get_Dynamic()::Cint
mkl_get_dynamic (generic function with 1 method)
julia> mkl_set_dynamic(flag::Integer) = @ccall libmkl_rt.mkl_set_dynamic((Ref(Cint(flag)))::Ptr{Cint})::Cvoid
mkl_set_dynamic (generic function with 1 method)
julia> mkl_get_dynamic()
1
julia> mkl_set_dynamic(0)
julia> mkl_get_dynamic()
0
julia> mkl_set_dynamic(1)
julia> mkl_get_dynamic()
1
Seems to work fine to just @ccall
libname+symbol as usual
Side note, I don't think we need to use dlsym also with the Fortran ABI?
This is because I don't know how to use @ccall
...... ;)
I want to call
mkl_get_dynamic
andmkl_set_dynamic
. I tried wrapping them as follows.While
mkl_get_dynamic()
works I get a segfault formkl_set_dynamic(0)
:I then noted that one also gets a segfault when trying to wrap
mkl_set_num_threads
similar tomkl_set_dynamic
above and realized that forBLAS.set_num_threads
we go through LBT, i.e.lbt_set_num_threads
.My question therefore is how to call a generic MKL "setter" function (which doesn't have a
lbt_*
pendant). Any help would be much appreciated.