MagneticResonanceImaging / MRIReco.jl

Julia Package for MRI Reconstruction
https://magneticresonanceimaging.github.io/MRIReco.jl/latest/
Other
85 stars 22 forks source link

ESPIRiT Segfault #46

Closed JakobAsslaender closed 2 years ago

JakobAsslaender commented 2 years ago

I keep getting segfault errors in ESPIRiT. They only seem to happen on large 3D datasets and a 40-core HPC machine. Most of the times, the error message is unspecific, but one time I get a bit more of a specific error message:

signal (11): Segmentation fault
in expression starting at ...
cgemm_itcopy_SKYLAKEX at .../julia-1.7.1/bin/../lib/julia/libopenblas64_.so (unknown line)
cgemm_nn at .../julia-1.7.1/bin/../lib/julia/libopenblas64_.so (unknown line)
cgemm_64_ at .../julia-1.7.1/bin/../lib/julia/libopenblas64_.so (unknown line)
cgesdd_64_ at .../julia-1.7.1/bin/../lib/julia/libopenblas64_.so (unknown line)
gesdd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/lapack.jl:1659
_svd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:122 [inlined]
#svd!#99 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:102
svd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:98 [inlined]
macro expansion at .../MRIReco/src/Tools/CoilSensitivity.jl:209 [inlined]
#158#threadsfor_fun at ./threadingconstructs.jl:85
#158#threadsfor_fun at ./threadingconstructs.jl:52
unknown function (ip: 0x2aad2409d59f)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:877
Allocations: 2145713699 (Pool: 2145669898; Big: 43801); GC: 135

which seems to point at the SVD in this line: https://github.com/MagneticResonanceImaging/MRIReco.jl/blob/62afb3cd455078a5d0ff8164557090e297d77483/src/Tools/CoilSensitivity.jl#L209

The error is not reproducible and I so far failed to create a MWE. But it seems to occur with both @batch and @threads multithreading and is not related to to the @view macro. I still have to try out single-threaded, but this is going to be slllooooowwww on such large datasets. Just wanted to create a track record in case anyone else is experience this problem, or has a suggestion on how to fix it.

tknopp commented 2 years ago

I would say this is either a Julia bug or a Julia limitation. So probably open a bug in Julia. This will, however, require that you make a very small example, where you basically extract the code in question into something like a 10 lines example.

Other than that you could test https://github.com/JuliaLinearAlgebra/MKL.jl

JakobAsslaender commented 2 years ago

Yeah, that is the problem: I was not able to create a minimal working example. On smaller datasets, e.g. the test suit of MRIReco, it works just fine. And creating a large dataset with random numbers also did not trigger the issue in the particular way I tried. But maybe I'll submit an issue w/o MWE and see if anyone has an idea.