chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

ugni+hugepages is incompatible with some Spawn calls #7550

Open mppf opened 7 years ago

mppf commented 7 years ago

Can't fork when memory is registered with the NIC under ugni. In results in a segfault in the forked process. This is because the Chapel UGNI comm layer uses the GNI_CDM_MODE_FORK_NOCOPY flag.

Note that Spawn calls that forward all output (rather than piping/capturing it) use vfork and so don't have this problem.

For 1.12, we decided to halt if a user is about to call fork with NIC registered memory. See PR #2539.

Long term we need a better solution. We thought about throwing GNI_CDM_MODE_FORK_FULLCOPY instead of GNI_CDM_MODE_FORK_NOCOPY. However, this means that the parent will duplicate the registered memory at the time of the fork. That might involve allocating and duplicating quite a lot of memory - we might get a really slow fork or an OOM.

Perhaps recent comm=ugni dynamic registration improvements are another way to solve this issue.

Steps to Reproduce

First, comment out the UGNI error in Spawn.chpl's spawn function. Then:

module load craype-hugepages8M
cd test/modules/standard/Spawn/ugni
chpl spawn-system.chpl && ./spawn-system -nl 2 --pipeStdout=true
mppf commented 7 years ago

When I investigated this issue in detail (in September 2015), I narrowed the problem down to memory in the .data segment being no longer accessible after the fork. In particular this was causing problems with a variable called __fork_generation_pointer in glibc. I was able to generate similar core dumps if I made a C program that madvise(DONT_FORK)'d the data segment and then ran fork().

If this is the only issue, we might be able to solve the problem if we can avoid registering the data segment.

Also, it's not the arguments to fork. We already allocate those with the system allocator (even in a hugepages configuration).

ben-albrecht commented 6 years ago

The AI workflow application I have been working on relies on the Spawn module, and so we need to unload hugepages in order for it to work with the ugni comm layer. The program is not communication-bound today, so this is not a big deal, but that may not always be true. This will likely become an important issue for this application in the future.

ben-albrecht commented 6 years ago

cc @gbtitus

cassella commented 6 years ago

Have you tried GNI_CDM_MODE_FORK_PARTCOPY? At least for this app, if switching to 4k pages with NOCOPY lets the app work, it seems likely to me that PARTCOPY will work as well.

ben-albrecht commented 6 years ago

Thanks for the suggestion @cassella - I will try that out.

gbtitus commented 6 years ago

Basically registering hugepage-based regions for everything is incompatible with spawning that doesn't use vfork(). So currently if you use CHPL_COMM=ugni and do the style of spawning that doesn't use vfork() your only option is that comm layer's so-called minimal-registered-memory mode, in which you run without a hugepage module loaded (see here).

Another alternative on XC-based systems would be for the ugni comm layer to register memory as it does now, but without using hugepages. That would oversubscribe the NIC's TLB cache but the performance cost of doing so might be less than that of minimal-registered-memory mode. As luck would have it, we are investigating precisely this, in #10262.

gbtitus commented 6 years ago

Unfortunately registering on 4k pages instead of hugepages doesn't help with the Spawn problem, because the problem has to do with registration rather than pagesize. See here for the explanation.