OpenFOAM / OpenFOAM-8

OpenFOAM Foundation repository for OpenFOAM version 8
GNU General Public License v3.0
115 stars 85 forks source link

Communication Deadlock in MPPICFoam Parallel Solver #14

Open BradleyMorgan opened 3 years ago

BradleyMorgan commented 3 years ago

Summary

OpenFOAM version 8 experiences what appears to be a communication deadlock in a scheduled send\receive operation.

The case in question attempts to solve a toy CFD problem evaluating airflow within a rectangular prism using parallel instances of MPPICFoam on decomposed input.

This behavior occurs on multiple machines. We have tested on multiple workstations and HPC Clusters.

Our research team has prepared a detailed report (attached).

Multiple versions of OpenMPI in terms of both release (e.g. 4.0.3, 2.1.6), compiler (e.g Intel, gcc), bit layer transport (e.g. ucx, openib) in conjunction with multiple builds of OpenFOAM 8, have been attempted. Blocking vs. nonblocking communication and a number of mpirun command line tuning parameters (including varied world sizes) have also been attempted with no resolution.

To determine if the file-system was a factor, the case was run on both local and parallel (GPFS) storage. No difference in runtime behavior was observed when running on local vs. parallel storage.

Additionally, a number of case configuration values (e.g. mesh sizing, simulation times, etc.) without any effect.

For debugging purposes the simulation deltaT was adjusted from 1e-3 to 1.0 which greatly reduces the time to failure.

Based on what we see from our debugging tools, we think we have isolated the issue to the applications/solvers/lagrangian/DPMFoam solver.

Our research team has indicated that the simulation runs successfully when particle injection is disabled.

Some key findings:

1.) The deadlock occurs with send\recv size = 1 2.) The deadlock occurs when the number of iterations = 1. This differs significantly from the previous iteration counts = 1000. This seems to indicate that there may be some miscalculation or an unexpected value passed somewhere.

Compiled_MPPICFoam_Parallel_Troublshooting.pptx

HW Environment

Architecture: x86_64 Processor: 2x Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz Cores \ Node: 48

Software Environment

OS: CentOS Linux release 7.9.2009 Application: OpenFOAM 8 MPI: OpenMPI 4.0.3

MPI Build

.. code-block:: bash

$ ompi_info 

                 Package: Open MPI ... Distribution
                Open MPI: 4.0.3
  Open MPI repo revision: v4.0.3
   Open MPI release date: Mar 03, 2020
                Open RTE: 4.0.3
  Open RTE repo revision: v4.0.3
   Open RTE release date: Mar 03, 2020
                    OPAL: 4.0.3
      OPAL repo revision: v4.0.3
       OPAL release date: Mar 03, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.3
                  Prefix: /tools/openmpi-4.0.3/gcc/4.8.5/ucx
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: c20-login01
           Configured by: hpcuser
           Configured on: Wed Apr 14 10:40:09 CDT 2021
          Configure host: c20-login01
  Configure command line: '--prefix=/tools/openmpi-4.0.3/gcc/4.8.5/ucx'
                          '--with-slurm'
                Built by: hpcuser
                Built on: Wed Apr 14 10:50:39 CDT 2021
              Built host: c20-login01
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: yes (all)
            Fort use mpi: yes (limited: overloading)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: /usr/bin/gcc
     C compiler absolute: 
  C compiler family name: GNU
      C compiler version: 4.8.5
            C++ compiler: /usr/bin/g++
   C++ compiler absolute: none
           Fort compiler: /usr/bin/gfortran
       Fort compiler abs: 
         Fort ignore TKR: no
   Fort 08 assumed shape: no
      Fort optional args: no
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: no
      Fort BIND(C) (all): no
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): no
       Fort TYPE,BIND(C): no
 Fort T,BIND(C,name="a"): no
            Fort PRIVATE: no
          Fort PROTECTED: no
           Fort ABSTRACT: no
       Fort ASYNCHRONOUS: no
          Fort PROCEDURE: no
         Fort USE...ONLY: no
           Fort C_FUNLOC: no
 Fort f08 using wrappers: no
         Fort MPI_SIZEOF: no
             C profiling: yes
           C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: no
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: no
          MPI extensions: affinity, cuda, pcollreq
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024

OpenFOAM Case

/scratch/hpcadmn/djy0006/case5/B-3-22471/system/decomposeParDict

/--------------------------------- C++ -----------------------------------\ ========= | \ / F ield | OpenFOAM: The Open Source CFD Toolbox \ / O peration | Website: https://openfoam.org \ / A nd | Version: 8 \/ M anipulation | *---------------------------------------------------------------------------/ FoamFile { version 2.0; format ascii; class dictionary; location "system"; object decomposeParDict; } // //

numberOfSubdomains 3;

method simple;

simpleCoeffs { n (3 1 1); delta 0.001; }

// ***** //

/scratch/hpcadmn/djy0006/case5/B-3-22471/system/controlDict

/--------------------------------- C++ -----------------------------------\ ========= | \ / F ield | OpenFOAM: The Open Source CFD Toolbox \ / O peration | Website: https://openfoam.org \ / A nd | Version: 8 \/ M anipulation | *---------------------------------------------------------------------------/ FoamFile { version 2.0; format ascii; class dictionary; location "system"; object controlDict; } // //

application MPPICFoam; startFrom startTime; startTime 0.0; stopAt endTime; endTime 6.5; deltaT 1.0; writeControl timeStep; writeInterval 1; purgeWrite 0; writeFormat ascii; writePrecision 6; writeCompression off; timeFormat general; timePrecision 6; runTimeModifiable no;

// ***** //

OptimisationSwitches {

fileModificationSkew 60;
fileModificationChecking timeStampMaster;
fileHandler uncollated;
maxThreadFileBufferSize 2e9;
maxMasterFileBufferSize 2e9;
commsType       blocking; // nonBlocking; // scheduled; // blocking;
floatTransfer   0;
nProcsSimpleSum 0;
Force dumping (at next timestep) upon signal (-1 to disable)
writeNowSignal             -1; // 10;
stopAtWriteNowSignal       -1;
inputSyntax dot;
mpiBufferSize   200000000;
maxCommsSize    0;
trapFpe         1;
setNaN          0;

}

DebugSwitches { UPstream 1; Pstream 1; processor 1; IFstream 1; OFstream 1; }

Summary of Debug Output

The following debug output was generated using the above case configuration with an MPI_World size of 3...

$ srun -N1 -n3 --pty /bin/bash ... $ module load openfoam/8-ompi2 $ source /tools/openfoam-8/mpich/OpenFOAM-8/etc/bashrc $ decomposePar -force $ mpirun -np $SLURM_NTASKS MPPICFoam -parallel

Following the process tree of ...

$ pstree -ac --show-parents -p -l 54148 systemd,1 └─slurmstepd,262636 └─bash,262643 └─mpirun,54148 -np 3 MPPICFoam -parallel ├─MPPICFoam,54152 -parallel │ ├─{MPPICFoam}, │ ├─{MPPICFoam}, │ └─{MPPICFoam}, ├─MPPICFoam,541523 -parallel │ ├─{MPPICFoam}, │ ├─{MPPICFoam}, │ └─{MPPICFoam}, ├─MPPICFoam,54154 -parallel │ ├─{MPPICFoam}, │ ├─{MPPICFoam}, │ └─{MPPICFoam}, ├─{mpirun}, ├─{mpirun}, └─{mpirun},

The case output at the time of failure looks like ...

[0] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0 [0] UPstream::waitRequests : finished wait. [0] UIPstream::read : starting read from:1 tag:1 comm:0 wanted size:1 commsType:scheduled [0] UIPstream::read : finished read from:1 tag:1 read size:1 commsType:scheduled [0] UIPstream::read : starting read from:2 tag:1 comm:0 wanted size:1 commsType:scheduled [0] UIPstream::read : finished read from:2 tag:1 read size:1 commsType:scheduled [0] UOPstream::write : starting write to:2 tag:1 comm:0 size:1 commsType:scheduled [0] UOPstream::write : finished write to:2 tag:1 size:1 commsType:scheduled [0] UOPstream::write : starting write to:1 tag:1 comm:0 size:1 commsType:scheduled [0] UOPstream::write : finished write to:1 tag:1 size:1 commsType:scheduled [2] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0 [2] UPstream::waitRequests : finished wait. [2] UOPstream::write : starting write to:0 tag:1 comm:0 size:1 commsType:scheduled [2] UOPstream::write : finished write to:0 tag:1 size:1 commsType:scheduled [2] UIPstream::read : starting read from:0 tag:1 comm:0 wanted size:1 commsType:scheduled [2] UIPstream::read : finished read from:0 tag:1 read size:1 commsType:scheduled [1] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0 [1] UPstream::waitRequests : finished wait. [1] UOPstream::write : starting write to:0 tag:1 comm:0 size:1 commsType:scheduled [1] UOPstream::write : finished write to:0 tag:1 size:1 commsType:scheduled [1] UIPstream::read : starting read from:0 tag:1 comm:0 wanted size:1 commsType:scheduled [1] UIPstream::read : finished read from:0 tag:1 read size:1 commsType:scheduled

<... freeze ...>

Here, the communication schedule seems to be balanced with all matching send and receives (based on size and tag). However, the behavior indicates a blocking send or receive call.

The deadlock always seems to occur for size=1 send\recv operations.

* The remaining content consists of strace\gdb output from the MPI ranks. *****

In the root mpirun process (54152) looks like it is stuck in a poll loop.

Rank 0 appears to be issuing a return from Foam::BarycentricTensor::BarycentricTensor().

Ranks 1 and 2 appear to be waiting on PMPI_Alltoall communication.


GDB

Root (MPI) Process

[node040 B-3-22471]$ mpirun -np $SLURM_NTASKS MPPICFoam -parallel > /dev/null 2>&1 & [1] 54148

[node040 B-3-22471]$ ps -ef | grep hpcuser hpcuser 47219 47212 0 09:08 pts/0 00:00:00 /bin/bash hpcuser 54148 47219 1 09:54 pts/0 00:00:00 mpirun -np 3 MPPICFoam -parallel hpcuser 54152 54148 72 09:54 pts/0 00:00:02 MPPICFoam -parallel hpcuser 54153 54148 81 09:54 pts/0 00:00:02 MPPICFoam -parallel hpcuser 54154 54148 81 09:54 pts/0 00:00:02 MPPICFoam -parallel hpcuser 54166 47219 0 09:54 pts/0 00:00:00 ps -ef hpcuser 54167 47219 0 09:54 pts/0 00:00:00 grep --color=auto hpcuser

[node040 B-3-22471]$ gdb -p 54148 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7

(gdb) frame

0 0x00002aaaac17fccd in poll () from /usr/lib64/libc.so.6

(gdb) where

0 0x00002aaaac17fccd in poll () from /usr/lib64/libc.so.6

1 0x00002aaaab096fc6 in poll_dispatch (base=0x659370, tv=0x0) at ../../../../../../../openmpi-4.0.3/opal/mca/event/libevent2022/libevent/poll.c:165

2 0x00002aaaab08ec80 in opal_libevent2022_event_base_loop (base=0x659370, flags=1) at ../../../../../../../openmpi-4.0.3/opal/mca/event/libevent2022/libevent/event.c:1630

3 0x0000000000401438 in orterun (argc=5, argv=0x7fffffffaae8) at ../../../../../openmpi-4.0.3/orte/tools/orterun/orterun.c:178

4 0x0000000000400f6d in main (argc=5, argv=0x7fffffffaae8) at ../../../../../openmpi-4.0.3/orte/tools/orterun/main.c:13

(gdb) n Single stepping until exit from function poll, which has no line number information. < ... freeze ... >

(gdb) disassemble Dump of assembler code for function poll: 0x00002aaaac0ceca0 <+0>: cmpl $0x0,0x2d930d(%rip) # 0x2aaaac3a7fb4 <__libc_multiple_threads> 0x00002aaaac0ceca7 <+7>: jne 0x2aaaac0cecb9 <poll+25> 0x00002aaaac0ceca9 <+0>: mov $0x7,%eax 0x00002aaaac0cecae <+5>: syscall 0x00002aaaac0cecb0 <+7>: cmp $0xfffffffffffff001,%rax 0x00002aaaac0cecb6 <+13>: jae 0x2aaaac0cece9 <poll+73> 0x00002aaaac0cecb8 <+15>: retq
0x00002aaaac0cecb9 <+25>: sub $0x8,%rsp 0x00002aaaac0cecbd <+29>: callq 0x2aaaac0e7720 <__libc_enable_asynccancel> 0x00002aaaac0cecc2 <+34>: mov %rax,(%rsp) 0x00002aaaac0cecc6 <+38>: mov $0x7,%eax 0x00002aaaac0ceccb <+43>: syscall => 0x00002aaaac0ceccd <+45>: mov (%rsp),%rdi 0x00002aaaac0cecd1 <+49>: mov %rax,%rdx 0x00002aaaac0cecd4 <+52>: callq 0x2aaaac0e7780 <__libc_disable_asynccancel> 0x00002aaaac0cecd9 <+57>: mov %rdx,%rax 0x00002aaaac0cecdc <+60>: add $0x8,%rsp 0x00002aaaac0cece0 <+64>: cmp $0xfffffffffffff001,%rax 0x00002aaaac0cece6 <+70>: jae 0x2aaaac0cece9 <poll+73> 0x00002aaaac0cece8 <+72>: retq
0x00002aaaac0cece9 <+73>: mov 0x2d3160(%rip),%rcx # 0x2aaaac3a1e50 0x00002aaaac0cecf0 <+80>: neg %eax 0x00002aaaac0cecf2 <+82>: mov %eax,%fs:(%rcx) 0x00002aaaac0cecf5 <+85>: or $0xffffffffffffffff,%rax 0x00002aaaac0cecf9 <+89>: retq
End of assembler dump.

Rank 0 Process

[node040 B-3-22471]$ gdb -p 54152

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 Attaching to process 54152 Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/bin/MPPICFoam...done. Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangian.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangian.so Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianIntermediate.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianIntermediate.so Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianTurbulence.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianTurbulence.so Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/libincompressibleTransportModels.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/libincompressibleTransportModels.so ... (gdb) disassemble Dump of assembler code for function Foam::BarycentricTensor::BarycentricTensor(Foam::Vector const&, Foam::Vector const&, Foam::Vector const&, Foam::Vector const&): 0x00000000004aa9fc <+0>: push %rbp 0x00000000004aa9fd <+1>: mov %rsp,%rbp 0x00000000004aaa00 <+4>: sub $0x30,%rsp 0x00000000004aaa04 <+8>: mov %rdi,-0x8(%rbp) => 0x00000000004aaa08 <+12>: mov %rsi,-0x10(%rbp) 0x00000000004aaa0c <+16>: mov %rdx,-0x18(%rbp) 0x00000000004aaa10 <+20>: mov %rcx,-0x20(%rbp) 0x00000000004aaa14 <+24>: mov %r8,-0x28(%rbp) 0x00000000004aaa18 <+28>: mov -0x8(%rbp),%rax 0x00000000004aaa1c <+32>: mov %rax,%rdi 0x00000000004aaa1f <+35>: callq 0x4b9302 <Foam::MatrixSpace<Foam::BarycentricTensor, double, (unsigned char)4, (unsigned char)3>::MatrixSpace()> 0x00000000004aaa24 <+40>: mov -0x10(%rbp),%rax 0x00000000004aaa28 <+44>: mov %rax,%rdi 0x00000000004aaa2b <+47>: callq 0x4a7af8 <Foam::Vector::x() const> 0x00000000004aaa30 <+52>: mov (%rax),%rax 0x00000000004aaa33 <+55>: mov -0x8(%rbp),%rdx 0x00000000004aaa37 <+59>: mov %rax,(%rdx) 0x00000000004aaa3a <+62>: mov -0x18(%rbp),%rax 0x00000000004aaa3e <+66>: mov %rax,%rdi 0x00000000004aaa41 <+69>: callq 0x4a7af8 <Foam::Vector::x() const> 0x00000000004aaa46 <+74>: mov (%rax),%rax 0x00000000004aaa49 <+77>: mov -0x8(%rbp),%rdx 0x00000000004aaa4d <+81>: mov %rax,0x8(%rdx) 0x00000000004aaa51 <+85>: mov -0x20(%rbp),%rax 0x00000000004aaa55 <+89>: mov %rax,%rdi 0x00000000004aaa58 <+92>: callq 0x4a7af8 <Foam::Vector::x() const> 0x00000000004aaa5d <+97>: mov (%rax),%rax 0x00000000004aaa60 <+100>: mov -0x8(%rbp),%rdx 0x00000000004aaa64 <+104>: mov %rax,0x10(%rdx) 0x00000000004aaa68 <+108>: mov -0x28(%rbp),%rax 0x00000000004aaa6c <+112>: mov %rax,%rdi 0x00000000004aaa6f <+115>: callq 0x4a7af8 <Foam::Vector::x() const> 0x00000000004aaa74 <+120>: mov (%rax),%rax 0x00000000004aaa77 <+123>: mov -0x8(%rbp),%rdx 0x00000000004aaa7b <+127>: mov %rax,0x18(%rdx) 0x00000000004aaa7f <+131>: mov -0x10(%rbp),%rax 0x00000000004aaa83 <+135>: mov %rax,%rdi 0x00000000004aaa86 <+138>: callq 0x4a7b06 <Foam::Vector::y() const> 0x00000000004aaa8b <+143>: mov (%rax),%rax 0x00000000004aaa8e <+146>: mov -0x8(%rbp),%rdx

(gdb) frame

0 0x00000000004a7c6d in Foam::operator^ (v1=..., v2=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/VectorI.H:159

(gdb) frame

0 0x00000000004a55a2 in Foam::tetIndices::faceTriIs (this=0x7fffffff4b10, mesh=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/tetIndicesI.H:86

86 label facePtI = (tetPt() + faceBasePtI) % f.size();

0 0x00002aaaaacfd654 in Foam::BarycentricTensor::d (this=0x7fffffff4620) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:159

159 return Vector(this->v[XD], this->v[YD], this->v_[ZD]);

(gdb) where

0 0x00000000004ceebd in Foam::Barycentric::Barycentric (this=0x7fffffff4be0, va=@0x7fffffff4cc0: -0.13335, vb=@0x7fffffff4cc8: -0.13716, vc=@0x7fffffff4cd0: -0.13716,

vd=@0x7fffffff4cd8: -0.12953999999999999) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricI.H:50

1 0x00000000004b97f5 in Foam::BarycentricTensor::z (this=0x7fffffff4c80) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:131

2 0x00000000004aae13 in Foam::operator& (T=..., b=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:177

3 0x00000000004a6a1e in Foam::particle::position (this=0x2240a40) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/particleI.H:280

4 0x00002aaaaacf837d in Foam::particle::deviationFromMeshCentre (this=0x2240a40) at particle/particle.C:1036

5 0x000000000051ac3a in Foam::KinematicParcel::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (

this=0x2240a40, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicParcel.C:309

6 0x000000000050acc2 in Foam::MPPICParcel<Foam::KinematicParcel >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x2240a40, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICParcel.C:102

7 0x00000000004f22f3 in Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:205

8 0x00000000004f1e18 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > >::motion<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247

9 0x00000000004da066 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > >::evolveCloud<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210

10 0x00000000004c3497 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > >::solve<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114

11 0x00000000004afc73 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > >::evolve (this=0x7fffffff7220)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169

12 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109

Rank 1 Process

[node040 B-3-22471]$ gdb -p 54153

(gdb) disassemble Dump of assembler code for function uct_rc_mlx5_iface_progress_cyclic: 0x00002aaaca088020 <+0>: push %r15 0x00002aaaca088022 <+2>: push %r14 0x00002aaaca088024 <+4>: push %r13 0x00002aaaca088026 <+6>: push %r12 0x00002aaaca088028 <+8>: push %rbp 0x00002aaaca088029 <+9>: push %rbx 0x00002aaaca08802a <+10>: mov %rdi,%rbx 0x00002aaaca08802d <+13>: sub $0x38,%rsp 0x00002aaaca088031 <+17>: movzwl 0x86c8(%rdi),%edx 0x00002aaaca088038 <+24>: movzwl 0x86c6(%rdi),%eax 0x00002aaaca08803f <+31>: imul %edx,%eax 0x00002aaaca088042 <+34>: mov 0x86b0(%rdi),%rdx 0x00002aaaca088049 <+41>: cltq
0x00002aaaca08804b <+43>: cmpw $0x0,0x2(%rdx,%rax,1) => 0x00002aaaca088051 <+49>: jne 0x2aaaca0887cd <uct_rc_mlx5_iface_progress_cyclic+1965> 0x00002aaaca088057 <+55>: mov 0x86f0(%rdi),%rax 0x00002aaaca08805e <+62>: mov 0x8708(%rdi),%edx 0x00002aaaca088064 <+68>: mov 0x870c(%rdi),%ecx 0x00002aaaca08806a <+74>: prefetcht0 (%rax) 0x00002aaaca08806d <+77>: mov 0x8700(%rdi),%eax 0x00002aaaca088073 <+83>: lea -0x1(%rdx),%ebp 0x00002aaaca088076 <+86>: and %eax,%ebp 0x00002aaaca088078 <+88>: shl %cl,%ebp 0x00002aaaca08807a <+90>: add 0x86f8(%rdi),%rbp 0x00002aaaca088081 <+97>: movzbl 0x3f(%rbp),%ecx 0x00002aaaca088085 <+101>: mov %ecx,%esi 0x00002aaaca088087 <+103>: and $0x1,%esi 0x00002aaaca08808a <+106>: test %eax,%edx 0x00002aaaca08808c <+108>: setne %dl 0x00002aaaca08808f <+111>: cmp %sil,%dl 0x00002aaaca088092 <+114>: jne 0x2aaaca088686 <uct_rc_mlx5_iface_progress_cyclic+1638> 0x00002aaaca088098 <+120>: test %cl,%cl 0x00002aaaca08809a <+122>: js 0x2aaaca088679 <uct_rc_mlx5_iface_progress_cyclic+1625> 0x00002aaaca0880a0 <+128>: add $0x1,%eax 0x00002aaaca0880a3 <+131>: mov %eax,0x8700(%rdi) 0x00002aaaca0880a9 <+137>: movzwl 0x3c(%rbp),%r13d 0x00002aaaca0880ae <+142>: movzwl 0x86c6(%rdi),%edi 0x00002aaaca0880b5 <+149>: movzwl 0x86c8(%rbx),%ecx 0x00002aaaca0880bc <+156>: mov 0x86b0(%rbx),%rdx 0x00002aaaca0880c3 <+163>: mov 0x2c(%rbp),%r12d 0x00002aaaca0880c7 <+167>: mov %r13d,%eax 0x00002aaaca0880ca <+170>: ror $0x8,%ax

(gdb) frame

0 0x00002aaac936812d in uct_mm_iface_progress (tl_iface=) at ../../../src/uct/sm/mm/base/mm_iface.c:365

(gdb) frame

0 0x00002aaaca08848a in uct_rc_mlx5_iface_progress_cyclic (arg=) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:183

183 }

(gdb) where

0 0x00002aaaca088484 in uct_rc_mlx5_iface_progress_cyclic (arg=) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:183

1 0x00002aaac90b608a in ucs_callbackq_dispatch (cbq=) at /home/hpcuser/build/ucx/build/../src/ucs/datastruct/callbackq.h:211

2 uct_worker_progress (worker=) at /home/hpcuser/build/ucx/build/../src/uct/api/uct.h:2592

3 ucp_worker_progress (worker=0xb9f390) at ../../../src/ucp/core/ucp_worker.c:2530

4 0x00002aaac8c6c6d7 in mca_pml_ucx_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/openmpi/mca_pml_ucx.so

5 0x00002aaab91c780c in opal_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libopen-pal.so.40

6 0x00002aaab85111bd in ompi_request_default_wait_all () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40

7 0x00002aaab8565398 in ompi_coll_base_alltoall_intra_basic_linear () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40

8 0x00002aaab85240d7 in PMPI_Alltoall () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40

9 0x00002aaab2569953 in Foam::UPstream::allToAll (sendData=..., recvData=..., communicator=0) at UPstream.C:367

10 0x00002aaab0a162c1 in Foam::Pstream::exchangeSizes<Foam::List<Foam::DynamicList<char, 0u, 2u, 1u> > > (sendBufs=..., recvSizes=..., comm=0) at db/IOstreams/Pstreams/exchange.C:158

11 0x00002aaab0a15d0d in Foam::PstreamBuffers::finishedSends (this=0x7fffffff4fe0, recvSizes=..., block=true) at db/IOstreams/Pstreams/PstreamBuffers.C:106

12 0x00000000004f2670 in Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:283

13 0x00000000004f1e18 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > >::motion<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247

14 0x00000000004da066 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > >::evolveCloud<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210

15 0x00000000004c3497 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > >::solve<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114

16 0x00000000004afc73 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > >::evolve (this=0x7fffffff7220)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169

17 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109

Rank 2 Process

[node040 B-3-22471]$ gdb -p 54154

(gdb) disassemble Dump of assembler code for function uct_mm_iface_progress: 0x00002aaac9367f80 <+0>: push %r15 0x00002aaac9367f82 <+2>: push %r14 0x00002aaac9367f84 <+4>: push %r13 0x00002aaac9367f86 <+6>: push %r12 0x00002aaac9367f88 <+8>: push %rbp 0x00002aaac9367f89 <+9>: push %rbx 0x00002aaac9367f8a <+10>: mov %rdi,%rbx 0x00002aaac9367f8d <+13>: sub $0x168,%rsp 0x00002aaac9367f94 <+20>: mov 0x558(%rdi),%esi 0x00002aaac9367f9a <+26>: movl $0x0,0x50(%rsp) 0x00002aaac9367fa2 <+34>: test %esi,%esi 0x00002aaac9367fa4 <+36>: je 0x2aaac9368527 <uct_mm_iface_progress+1447> 0x00002aaac9367faa <+42>: lea 0x598(%rbx),%r14 0x00002aaac9367fb1 <+49>: lea 0x560(%rbx),%r13 0x00002aaac9367fb8 <+56>: lea 0x60(%rsp),%r12 0x00002aaac9367fbd <+61>: mov 0x538(%rdi),%rdx 0x00002aaac9367fc4 <+68>: movabs $0x7fffffffffffffff,%rbp 0x00002aaac9367fce <+78>: xor %edi,%edi 0x00002aaac9367fd0 <+80>: movzbl 0x548(%rbx),%ecx 0x00002aaac9367fd7 <+87>: mov 0x540(%rbx),%rax 0x00002aaac9367fde <+94>: movzbl (%rdx),%edx 0x00002aaac9367fe1 <+97>: shr %cl,%rax => 0x00002aaac9367fe4 <+100>: xor %rdx,%rax 0x00002aaac9367fe7 <+103>: and $0x1,%eax 0x00002aaac9367fea <+106>: jne 0x2aaac9368500 <uct_mm_iface_progress+1408> 0x00002aaac9367ff0 <+112>: mov 0x528(%rbx),%rdx 0x00002aaac9367ff7 <+119>: mov (%rdx),%rdx 0x00002aaac9367ffa <+122>: and %rbp,%rdx 0x00002aaac9367ffd <+125>: cmp %rdx,0x540(%rbx) 0x00002aaac9368004 <+132>: ja 0x2aaac93682c8 <uct_mm_iface_progress+840> 0x00002aaac936800a <+138>: mov 0x538(%rbx),%r10 0x00002aaac9368011 <+145>: testb $0x2,(%r10) 0x00002aaac9368015 <+149>: je 0x2aaac93681ef <uct_mm_iface_progress+623> 0x00002aaac936801b <+155>: mov 0x224f36(%rip),%r15 # 0x2aaac958cf58 0x00002aaac9368022 <+162>: cmpl $0x7,(%r15) 0x00002aaac9368026 <+166>: ja 0x2aaac9368198 <uct_mm_iface_progress+536> 0x00002aaac936802c <+172>: lea 0x1c(%r10),%rsi 0x00002aaac9368030 <+176>: movzbl 0x1(%r10),%r9d 0x00002aaac9368035 <+181>: movzwl 0x2(%r10),%edx 0x00002aaac936803a <+186>: cmp $0x1f,%r9b 0x00002aaac936803e <+190>: ja 0x2aaac9368164 <uct_mm_iface_progress+484> 0x00002aaac9368044 <+196>: movzbl %r9b,%r9d

(gdb) frame

0 0x00002aaaca08823b in uct_ib_mlx5_get_cqe (cqe_index=51, cq=0xbf0778) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:13

13 return UCS_PTR_BYTE_OFFSET(cq->cq_buf, ((cqe_index & (cq->cq_length - 1)) <<

(gdb) where

0 0x00002aaaca08823b in uct_ib_mlx5_get_cqe (cqe_index=51, cq=0xbf0778) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:13

1 uct_ib_mlx5_poll_cq (cq=0xbf0778, iface=0xbe8050) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:73

2 uct_rc_mlx5_iface_poll_tx (iface=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:140

3 uct_rc_mlx5_iface_progress (flags=2, arg=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:177

4 uct_rc_mlx5_iface_progress_cyclic (arg=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:182

5 0x00002aaac90b608a in ucs_callbackq_dispatch (cbq=) at /home/hpcuser/build/ucx/build/../src/ucs/datastruct/callbackq.h:211

6 uct_worker_progress (worker=) at /home/hpcuser/build/ucx/build/../src/uct/api/uct.h:2592

7 ucp_worker_progress (worker=0xb9f3d0) at ../../../src/ucp/core/ucp_worker.c:2530

8 0x00002aaac8c6c6d7 in mca_pml_ucx_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/openmpi/mca_pml_ucx.so

9 0x00002aaab91c780c in opal_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libopen-pal.so.40

10 0x00002aaab85111bd in ompi_request_default_wait_all () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40

11 0x00002aaab8565398 in ompi_coll_base_alltoall_intra_basic_linear () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40

12 0x00002aaab85240d7 in PMPI_Alltoall () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40

13 0x00002aaab2569953 in Foam::UPstream::allToAll (sendData=..., recvData=..., communicator=0) at UPstream.C:367

14 0x00002aaab0a162c1 in Foam::Pstream::exchangeSizes<Foam::List<Foam::DynamicList<char, 0u, 2u, 1u> > > (sendBufs=..., recvSizes=..., comm=0) at db/IOstreams/Pstreams/exchange.C:158

15 0x00002aaab0a15d0d in Foam::PstreamBuffers::finishedSends (this=0x7fffffff4fe0, recvSizes=..., block=true) at db/IOstreams/Pstreams/PstreamBuffers.C:106

16 0x00000000004f2670 in Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:283

17 0x00000000004f1e18 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > >::motion<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247

18 0x00000000004da066 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > >::evolveCloud<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210

19 0x00000000004c3497 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > >::solve<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114

20 0x00000000004afc73 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel > > > >::evolve (this=0x7fffffff7220)

at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169

21 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109

strace

Root MPI

[node040 B-3-22471]$ strace -ff -p 54148 strace: Process 54148 attached with 4 threads [pid 54151] select(19, [17 18], NULL, NULL, {tv_sec=3516, tv_usec=700411} <unfinished ...> [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=0, tv_usec=963883} <unfinished ...> [pid 54149] epoll_wait(10, <unfinished ...> [pid 54148] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...> [pid 54150] <... select resumed>) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)

Rank 0

[node040 B-3-22471]$ strace -ff -p 54152 strace: Process 54152 attached with 4 threads [pid 54163] epoll_wait(27, <unfinished ...> [pid 54159] epoll_wait(11, <unfinished ...> [pid 54156] restart_syscall(<... resuming interrupted poll ...>

Rank 1

[node040 B-3-22471]$ strace -ff -p 54153 strace: Process 54153 attached with 4 threads [pid 54161] epoll_wait(27, <unfinished ...> [pid 54160] epoll_wait(11, <unfinished ...> [pid 54157] restart_syscall(<... resuming interrupted poll ...> <unfinished ...> [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) ....

Rank 2

[node040 B-3-22471]$ strace -ff -p 54154 strace: Process 54154 attached with 4 threads [pid 54158] epoll_wait(11, <unfinished ...> [pid 54155] restart_syscall(<... resuming interrupted poll ...> <unfinished ...> [pid 54162] epoll_wait(27, <unfinished ...> [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) ...

UCX Debug Log

[node040 B-3-22471]$ tail -f /tmp/ucx.log [1621958056.506833] [node040:61982:0] mm_iface.c:250 UCX DATA RX [10315] oi am_id 2 len 12 EGR_O tag fffff30000000003 [1621958056.506835] [node040:61982:0] tag_match.inl:119 UCX DATA checking req 0xcf70c0 tag fffff30000000003/ffffffffffffffff with tag fffff30000000003 [1621958056.506838] [node040:61982:0] tag_match.inl:121 UCX REQ matched received tag fffff30000000003 to req 0xcf70c0 [1621958056.506840] [node040:61982:0] eager_rcv.c:25 UCX REQ found req 0xcf70c0 [1621958056.506842] [node040:61982:0] ucp_request.inl:603 UCX REQ req 0xcf70c0: unpack recv_data req_len 4 data_len 4 offset 0 last: yes [1621958056.506845] [node040:61982:0] ucp_request.inl:205 UCX REQ completing receive request 0xcf70c0 (0xcf71d0) --e-cr- stag 0xfffff30000000003 len 4, Success [1621958056.506847] [node040:61982:0] ucp_request.c:80 UCX REQ free request 0xcf70c0 (0xcf71d0) d-e-cr- [1621958056.506849] [node040:61982:0] ucp_request.inl:181 UCX REQ put request 0xcf70c0 [1621958056.506867] [node040:61982:0] tag_send.c:246 UCX REQ send_nbx buffer 0x7fffffff50af count 1 tag 10000100003 to [1621958056.506870] [node040:61982:0] mm_ep.c:289 UCX DATA TX [201] -i am_id 2 len

< ... freeze ... > [Uploading Compiled_MPPICFoam_Parallel_Troublshooting.pptx…]()