Open Socob opened 6 months ago
Sorry for the edits, but right after creating this issue I thought I’d also observed this without using SharedArrays
. However, I can’t reproduce that right now…
OK, it’s definitely happening even for a normal array, so this has nothing to do with SharedArrays
! That makes it much worse!
I am having a similar error. My code runs fine on a Mac, but gives me errors like this when I run it on a linux cluster. I’ll try to construct an MFE — minimal failing example.
I am also having similar issues on a linux cluster (code runs fine on my old julia install on the laptop). I put some data in https://discourse.julialang.org/t/segmentation-fault-using-multithreaded-julia-on-new-server/114557 Not able to get a clean MFE.
I’m getting segmentation faults when using
Distributed
while passing--threads
to Julia, even when I’m not actually using any of those threads (see the MWE below). Needless to say, this is a huge problem when doing hybrid distributed- and shared-memory parallelization!Using the commented line instead (without
--threads
), I’m not getting any segmentation faults.Triggering the segfault does seem to depend on the number of worker processes, in that with a small number of workers, the issue is not triggered (or at least not consistently). It also doesn’t appear immediately, but after some non-deterministic time. The details may be machine-specific, but I’ve reproduced this on several different machines.
I don’t have any attempts at an explanation, since I don’t see how merely setting the number of Julia threads would affect this code.
versioninfo()
:A minimal working example (MWE), also known as a minimum reproducible example: