CFD-GO / TCLB

TCLB - Templated MPI+CUDA/CPU Lattice Boltzmann code
https://tclb.io
GNU General Public License v3.0
182 stars 72 forks source link

Simplepart is not exiting with MPI_Finalize #388

Open TravisMitchell opened 2 years ago

TravisMitchell commented 2 years ago

Problem

Simplepart is not exiting with MPI_Finalize like lammps: https://github.com/CFD-GO/TCLB/commit/d08db5d84c381145e3fd172bc4f5d003f6892337

TravisMitchell commented 2 years ago

With the fix, code still has issue of exiting job on failed error from another source - e.g., issue in xml file or memory error:

0: [  ]    ---- :    32x16   | SteadyAdjoint , OnlyObjective , BaseInit
0: [  ]    #### : [0] Cumulative allocation of 2077649024 b (36.4 GB)
1: [ 1] ERROR   ! out of memory in cross.cu at line 83
2: [ 2] ERROR   ! out of memory in cross.cu at line 83
0: [ 0] ERROR   ! out of memory in cross.cu at line 83
3: [ 3] ERROR   ! out of memory in cross.cu at line 83
srun: error: a085: tasks 1-2: Exited with exit code 1
srun: error: a085: task 0: Exited with exit code 1
srun: error: a085: task 3: Exited with exit code 1