Closed minitu closed 3 years ago
@rbuch @kavithachandrasekar Any ideas on what in #2531 could have caused this? I did a little digging and it looks like _Cmi_mynode
is not set properly in LrtsInit
, causing multiple processes to print the configurations.
@rbuch @kavithachandrasekar Any ideas on what in #2531 could have caused this? I did a little digging and it looks like
_Cmi_mynode
is not set properly inLrtsInit
, causing multiple processes to print the configurations.
Hmm, no, I'm not quite sure what would be causing that. If I'm reading the code correctly, it looks like _Cmi_mynode
is set directly from a value returned by the PAMI API, so I don't know how the LB refactor could get in the way of that. Have you seen this issue anywhere else? I remember there being issues (not this specific one, just general problems) on the SMP version of pamilrts a long time back, but I assume those have been resolved since then, @nitbhat?
@philmiller-charmworks was seeing this error (segfault when accessing _Cmi_mynode in startup when running mpi4py on AMPI last week. @evan-charmworks may know more about it too.
I got the following weird outputs with multicore-linux-x86_64:
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 32 threads (PEs)
Converse/Charm++ Commit ID: v6.11.0-devel-355-g3a973d587
Converse/Charm++ Commit ID: v6.11.0-devel-355-g3a973d587
Converse/Charm++ Commit ID: v6.11.0-devel-355-g3a973d587
Converse/Charm++ Commit ID: v6.11.0-devel-355-g3a973d587
Charm++ built without optimization.
Do not use for perforCharm++ built without optimization.
Do not use for perforCharm++ built without optimization.
Do not use for performance benchmarking (build with --with-production to do so).
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-eCharm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-eCharm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-eCharm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-eCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.017 seconds.
WARNING: Multiple PEs assigned to same core, recommend adjusting processor affinity or passing +CmiSleepOnIdle to reduce interference.
[0] TreeLB in LEGACY MODE support
[0] TreeLB: Using PE_Root tree with strategy Greedy
send: completed
zerocopySend: completed
mixedSend: completed
sdagRun: Iteration 2 completed
sdagRun: Iteration 3 completed
sdagRun: Iteration 4 completed
sdagRun: Iteration 5 completed
sdagRun: Iteration 6 completed
sdagRun: Iteration 7 completed
sdagRun: Iteration 8 completed
sdagRun: Iteration 9 completed
sdagRun: Iteration 10 completed
sdagRun: Iteration 11 completed
sdagRun: Iteration 12 completed
sdagRun: Iteration 13 completed
sdagRun: Iteration 14 completed
sdagRun: Iteration 15 completed
sdagRun: Iteration 16 completed
sdagRun: Iteration 17 completed
sdagRun: Iteration 18 completed
sdagRun: Iteration 19 completed
sdagRun: Iteration 20 completed
sdagRun: Iteration 21 completed
sdagRun: Iteration 22 completed
sdagRun: Iteration 23 completed
sdagRun: Iteration 24 completed
sdagRun: Iteration 25 completed
sdagRun: Iteration 26 completed
sdagRun: Iteration 27 completed
sdagRun: Iteration 28 completed
sdagRun: Iteration 29 completed
sdagRun: Iteration 30 completed
sdagRun: Iteration 31 completed
sdagRun: Iteration 32 completed
sdagRun: Iteration 33 completed
sdagRun: Iteration 34 completed
sdagRun: Iteration 35 completed
sdagRun: Iteration 36 completed
sdagRun: Iteration 37 completed
sdagRun: Iteration 38 completed
sdagRun: Iteration 39 completed
sdagRun: Iteration 40 completed
sdagRun: completed
All sending completed and result validated
[Partition 0][Node 0] End of program
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 32 threads (PEs)
Converse/Charm++ Commit ID: v6.11.0-devel-355-g3a973d587
Charm++ built without optimization.
Do not use for perforCharm++ built without optimization.
Do not use for performance benchmarking (build with --with-production to do so).
Charm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharm++ built with internal error checking enabled.
Do noCharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.020 seconds.
WARNING: Multiple PEs assigned to same core, recommend adjusting processor affinity or passing +CmiSleepOnIdle to reduce interference.
[0] TreeLB in LEGACY MODE support
[0] TreeLB: Using PE_Root tree with strategy Greedy
Segmentation fault (core dumped)
If the generated core dump is correct, the segfault is the "Multiple PEs assigned to same core." abort message.
Something similar happened with GNI SMP autobuild:
http://charm.cs.illinois.edu/autobuild/old.2021_04_29__01_04/gni-crayxc-smp.txt
../../../bin/testrun ./hello 10 +p4 ++ppn 2 +CmiSleepOnIdle
ModuleCmd_Switch.c(179):ERROR:152: Module 'PrgEnv-intel' is currently not loaded
ModuleCmd_Switch.c(179):ERROR:152: Module 'PrgEnv-intel' is currently not loaded
Running as 2 OS processes: ./hello 10 +ppn 2 +CmiSleepOnIdle
srun -n 2 -c 3 ./hello 10 +ppn 2 +CmiSleepOnIdle
Charm++> Running on Gemini (GNI) with 2 processes
Charm++> static SMSG
Charm++> SMSG memory: 9.9KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 8192K
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Charm++> Running on Gemini (GNI) with 2 processes
Charm++> static SMSG
Charm++> SMSG memory: 9.9KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 8192K
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Charm++> Running on Gemini (GNI) with 2 processes
Charm++> static SMSG
Charm++> SMSG memory: 9.9KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 8192K
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: 32d3e2b
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
*** Error in `/global/project/projectdirs/m2609/autobuild/gni-crayxc-smp/charm/gni-crayxc-smp/tests/charm++/simplearrayhello/./hello': double free or corruption (top): 0x00002aaaf40017b0 ***
srun: error: nid02220: task 1: Aborted
srun: Terminating job step 42100204.89
slurmstepd: error: *** STEP 42100204.89 ON nid02220 CANCELLED AT 2021-04-29T06:15:58 ***
srun: error: nid02220: task 0: Terminated
srun: Force Terminated job step 42100204.89
real 0m1.753s
user 0m0.105s
sys 0m0.110s
make[3]: *** [Makefile:28: smptest] Error 143
make[3]: Leaving directory '/global/project/projectdirs/m2609/autobuild/gni-crayxc-smp/charm/gni-crayxc-smp/tests/charm++/simplearrayhello'
make[2]: *** [Makefile:75: smptest-simplearrayhello] Error 2
I am getting a *** Error in '/scratch/e1000/trq/bench/./ChaNGa.smp': double free or corruption (out): 0x00002b0200000ca0 ***
that I bisected back to the same commit (#2531). The backtrace points to the following lines in convcore.C:
#if CMK_HAS_IO_FILE_OVERFLOW
// forcibly allocate output buffers now, see issue #2814
_IO_file_overflow(stdout, -1);
_IO_file_overflow(stderr, -1);
#endif
This is on a Cray EX building with mpi-linux-x86_64 smp.
Commenting out the above lines fixes the double free and also gets rid of the anomalous initialization messages reported above.
@evan-charmworks Are these _IO_file_overflow
calls needed here? Are they needed to fix #2814?
The following anomalous initialization statements show up with
1darray hello
when run with multiple PEs per process (e.g. 4 processes with 10 PEs per process) on PAMILRTS SMP build on LLNL Lassen:I did a git bisect and found the offending commit to be #2531.
Before this commit: