charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
199 stars 50 forks source link

Segfault when built with --enable-tracing-commthread and isomalloc #3818

Open trquinn opened 1 month ago

trquinn commented 1 month ago

I've been using using projects to track down communication performance issues. When I build charm with: ./build ChaNGa verbs-linux-x86_64 smp --enable-tracing-commthread -O2 -j8 --force and build ChaNGa with: ./configure --enable-avx --enable-cooling=H2 --enable-bigkeys --enable-projections; make Running ChaNGa produces:

Charm++> Running on MPI library: MVAPICH2 Version      :        2.3.7
MVAPICH2 Release date : Wed March 02 22:00:00 EST 2022
MVAPICH2 Device       : ch3:mrail
MVAPICH2 configure    : --prefix=/opt/apps/gcc11_2/mvapich2/2.3.7 --with-ch3-rank-bits=32 --enable-cxx --enable-romio --enable-fast=O3 
--enable-g=dbg --disable-static --enable-shared --enable-hybrid
MVAPICH2 CC           : gcc   -pipe   -g -O3
MVAPICH2 CXX          : g++   -pipe  -g -O3
MVAPICH2 F77          : gfortran   -pipe -w -fallow-argument-mismatch   -g -O3
MVAPICH2 FC           : gfortran    -g -O3
 (MPI standard: 3.1)
Charm++> Level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 2 processes, 63 worker threads (PEs) + 1 comm threads per process, 126 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v7.1.0-devel-353-g92fa36ab0
Charm++ built without optimization.
Do not use for performance benchmarking (build with --with-production to do so).
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++: Tracemode Projections enabled.
Trace: traceroot: /home1/00333/tg456090/src/changa/ChaNGa.smp.prj
[127] Stack Traceback:
  [127:0] ChaNGa.smp.prj 0x9486d7 
  [127:1] libpthread.so.0 0x1501a2e84ce0 
  [127:2] ChaNGa.smp.prj 0x9466a0 
  [127:3] ChaNGa.smp.prj 0x947509 LrtsAdvanceCommunication(int)
  [127:4] ChaNGa.smp.prj 0x9476fe CommunicationServerThread(int)
  [127:5] ChaNGa.smp.prj 0x90c50a 
  [127:6] ChaNGa.smp.prj 0x9478b4 
  [127:7] ChaNGa.smp.prj 0x947ff1 ConverseInit
  [127:8] ChaNGa.smp.prj 0x8ed72c charm_main
  [127:9] libc.so.6 0x1501a0ec6cf3 __libc_start_main
  [127:10] ChaNGa.smp.prj 0x5d009e _start
------------- Processor 127 Exiting: Caught Signal ------------

Note that adding +noisomalloc to the command line allows ChaNGa to run normally. This is with the 8.0.0 rc1 version of charm: 92fa36ab0a9728298278e713b7e61eb6b2d4af42

ericjbohm commented 1 month ago

Commthread tracing has always been a little shaky. Is this combo known to have worked on older versions of charm++?

trquinn commented 1 month ago

Older versions of charm didn't have isomalloc on by default, and ChaNGa doesn't need it, so I don't know.