chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 423 forks source link

gasnet-ibv issues with Spawn calls #13387

Open ronawho opened 5 years ago

ronawho commented 5 years ago

tl;dr -- the spawn module doesn't work correctly under gasnet-ibv. You can work around this by setting IBV_FORK_SAFE=1, but there may be performance implications from doing so.


Likely similar to https://github.com/chapel-lang/chapel/issues/7550, Spawn doesn't work correctly under gasnet-ibv. We originally noticed this with CoMD using Spawn to call uname in https://github.com/Cray/chapel-private/issues/311. Here's a simple reproducer from that.

use Spawn;

proc main() {
  for i in 0..#numLocales do writeln(Locales[i]);

  {
    writeln("---- begin spawn ----");
    var sub = spawn(["uname"], stdout=PIPE); sub.wait();
    writeln("---- end spawn ----");
  }

  for i in 0..#numLocales do writeln(Locales[i]);
}
LOCALE0
LOCALE1
---- begin spawn ----
---- end spawn ----
LOCALE0
LOCALE0 // should be "LOCALE1"

It looks like by default fork() (and things that call fork) don't work under ibv. There is an ibv_fork_init() that "initializes libibverbs's data structures to handle fork() function calls correctly and avoid data corruption". That can have performance implications as "Calling ibv_fork_init() will reduce performance due to an extra system call for every memory registration, and the additional memory allocated to track memory regions. The precise performance impact depends on the workload and usually will not be significant."

I'll investigate the performance overhead when I have time. If it seems small we may want to enable that by default. For users requiring a workaround "Setting the environment variable RDMAV_FORK_SAFE or IBV_FORK_SAFE has the same effect as calling ibv_fork_init()."

jhh67 commented 3 years ago

Looking at the implementation of ibv_fork_init what it does is use madvise(MADV_DONTFORK) on the registered memory regions. This is one cause of the additional overhead, along with the overhead of keeping track of the memory regions (they appear to implement a red/black tree for this). The MADV_DONTFORK prevents the child from accessing the region after the fork. As noted by @gbtitus in Cray/chapel-private#2019, this won't work for Chapel because the thread stacks are in the heap which is in registered memory regions so the threads of the child process won't be able to access their own stacks.