[FDS 6.7.9][CentOS7] hangs indefinitely when running in parallel

jchtheron commented 1 year ago

Good Afternoon,

I am unable to run FDS in parallel.

I may be missing something fundamental, but have prepared a simple case below to reproduce the issue.

The Issue

Running FDS with a 2-mesh case on 1 MPI process works:

[me@node1]$ mpiexec -n 1 fds case_2mesh.fds
...
STOP: FDS completed successfully (CHID: case_1mesh)

Running FDS with a 2-mesh case on 2 MPI processes hangs indefinitely after the first iteration:

[me@node1]$ mpiexec -n 2 fds case_2mesh.fds

 Starting FDS ...

 MPI Process      0 started on node1
 MPI Process      1 started on node1

 Reading FDS input file ...

 Fire Dynamics Simulator

 Current Date     : May 19, 2023  15:50:07
 Revision         : FDS6.7.9-0-gec52dee42-release
 Revision Date    : Sun Jun 26 14:36:40 2022 -0400
 Compiler         : ifort version 2021.6.0
 Compilation Date : Jun 28, 2022 23:02:23

 MPI Enabled;    Number of MPI Processes:       2
 OpenMP Disabled

 MPI version: 3.1
 MPI library version: Intel(R) MPI Library 2021.6 for Linux* OS

 Job TITLE        : Debug Parallel
 Job ID string    : case_2mesh

 Time Step:      1, Simulation Time:      0.22 s

Using top, one can see that the 2 MPI processes have been spawned:

[me@node1]$ top
...
   PID USER     PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  6526 me       20   0 2462652 100784  11592 R 100.0  0.0   0:23.02 fds
  6527 me       20   0 2457352  90884  11304 R 100.0  0.0   0:23.03 fds   
...

Supplementary information

Operating System:

[me@node1]$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
...

FDS 6.7.9 is installed and the environment has been sourced:

[me@node1]$ which fds
/cluster/programs/fds/fds-6.7.9/bin/fds

[me@node1]$ which mpiexec
/cluster/programs/fds/fds-6.7.9/bin/INTEL/bin/mpiexec

OpenMP is disabled since the CPUs have 1 thread per core:

[me@node1]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    16
Socket(s):             2
...

[me@node1]$ echo $OMP_NUM_THREADS
1

Here is a super simple case that runs in seconds:

[me@node1]$ cat case_1mesh.fds
&HEAD CHID = 'case_1mesh', TITLE = 'Debug Parallel' /

&TIME T_END = 60.0 /

&DUMP NFRAMES = 60 /

# ---------------------------------------
# Meshing
# ---------------------------------------

&MESH ID  = 'Mesh01',
      IJK = 10, 10, 10,
      XB  = -1.0, 1.0,
            -1.0, 1.0,
             0.0, 2.0 /

# ---------------------------------------
# Domain boundary
# ---------------------------------------

&VENT MB = 'XMIN', SURF_ID = 'OPEN' /
&VENT MB = 'XMAX', SURF_ID = 'OPEN' /
&VENT MB = 'YMIN', SURF_ID = 'OPEN' /
&VENT MB = 'YMAX', SURF_ID = 'OPEN' /
&VENT MB = 'ZMAX', SURF_ID = 'OPEN' /

# ---------------------------------------
# Gas burner: propane
# ---------------------------------------

&SPEC ID = 'PROPANE' /

&REAC FUEL       = 'PROPANE',
      SOOT_YIELD = 0.022 /

&SURF ID      = 'Burner',
      COLOR   = 'RASPBERRY',
      HRRPUA  = 393.75 /

&OBST ID      = 'BurnerBase',
      SURF_ID = 'INERT',
      XB      = -0.2, 0.2, -0.2, 0.2, 0.0, 0.2 /

&VENT ID      = 'BurnerOutlet',
      SURF_ID = 'Burner',
      XB      = -0.2, 0.2, -0.2, 0.2, 0.2, 0.2 /

# ---------------------------------------
# Analysis
# ---------------------------------------

&DEVC ID       = 'Temp_1m',
      XYZ      = 0.1, 0.1, 1.3,
      QUANTITY = 'TEMPERATURE' /

&SLCF PBX           = 0.00,
      QUANTITY      = 'TEMPERATURE',
      CELL_CENTERED = .TRUE. /

&TAIL /

Here is the same case with the mesh split into two in the Z-direction:

...
# ---------------------------------------
# Meshing
# ---------------------------------------

&MESH ID  = 'Mesh01',
      IJK = 10, 10, 5,
      XB  = -1.0, 1.0,
            -1.0, 1.0,
             0.0, 1.0 /

&MESH ID  = 'Mesh02',
      IJK = 10, 10, 5,
      XB  = -1.0, 1.0,
            -1.0, 1.0,
             1.0, 2.0 /
...

Running FDS with a single mesh case in series works:

[me@node1]$ fds case_1mesh.fds
...
STOP: FDS completed successfully (CHID: case_1mesh)

Running FDS with a single mesh case on 1 MPI process works:

[me@node1]$ mpiexec -n 1 fds case_1mesh.fds
...
STOP: FDS completed successfully (CHID: case_1mesh)

I hope I am missing something simple. Please let me know if anyone has dealt with an issue like this before.

mcgratta commented 1 year ago

I cannot reproduce the error on a linux computer running Centos.

[mcgratta@burn Test]$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

[mcgratta@burn Test]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 106
Model name:            Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz

jchtheron commented 1 year ago

Thanks for the quick response.

I have tried the same cases on other hardware and have also not been able to recreate the same behaviour:

Model name: Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz
Model name: AMD Ryzen 9 5950X 16-Core Processor

The hardware for which it currently does not work is

Model name: AMD EPYC 7313 16-Core Processor

Is it expected that FDS behaves differently on different processors?

Can you think of any other variables which may affect the behaviour of FDS?

mcgratta commented 1 year ago

You can try the latest release, although we are not doing anything different in the compile. We do not use chip-specific optimization, so FDS should run on Intel and AMD. That said, this appears to be something involving MPI, which means that there are other "variables" here. Is the computer with the EPYC chip different than the others in terms of operating system?

jchtheron commented 1 year ago

Using FDS 6.8.0 yields the same behaviour: working with 1 or more meshes under 1 MPI process but hanging indefinitely for multiple MPI processes. I think you're right about MPI being the culprit.

The CentOS 7 install is fairly stock except for the necessary configuration for an SGE scheduler. Please let me if this might point to a likely cause. In the meantime, I will try to field relevant info about the OS setup from the supplier or otherwise.

mcgratta commented 1 year ago

Are you using SGE on all platforms? We use SLURM here at NIST. There is nothing special about that in that way we compile, but that might be another clue.

jchtheron commented 1 year ago

The other systems do not have SGE installed.

While debugging, I am not running through the SGE scheduler - just following the documentation as closely as possible - to, hopefully, eliminate as much variables as possible. But SGE is the only thing I can think of that is installed system wide which makes this setup differ from a stock CentOS 7 install.

mcgratta commented 1 year ago

What if you try to run the case using an SGE run script?

jchtheron commented 1 year ago

This is the end goal. I noticed the issue while trying to run a case through SGE. The example here is my attempt to debug but I am getting the same strange behaviour without SGE.

For reference, here is a minimal SGE submission script:

#!/bin/sh

#$ -N fds_debug
#$ -pe parallel_environment 2
#$ -S /bin/sh
#$ -j y
#$ -cwd

umask 000

# Sun Grid Engine
FDS_NP=$NSLOTS
FDS_HF=machines
cut -d" " -f1,2 < $PE_HOSTFILE | sed 's/ /:/' > $FDS_HF

# FDS Environment
ulimit -s unlimited
FDS_PATH=/cluster/programs/fds/fds-6.7.9
source $FDS_PATH/bin/FDS6VARS.sh
source $FDS_PATH/bin/SMV6VARS.sh
export OMP_NUM_THREADS=1

# Run
mpiexec -n $FDS_NP -machine $FDS_HF $FDS_PATH/bin/fds test.fds

mcgratta commented 1 year ago

I am out of ideas. The last recourse is to compile the code yourself on that system. There are many libraries that are involved in MPI and I do not know whether or not yours are all compatible with the executable that we build.

gforney commented 1 year ago

did you run your multi process cases that failed just using mpiexec ie not using the job scheduler SGE?

jchtheron commented 1 year ago

I am experiencing the same behaviour both using mpiexec and through the job scheduler.

Some additional testing: Intel E5-1650 V4 Running CentOS 7.8 (worked) Intel Xeon Silver 4116 Running CentOS 7.6 (worked) AMD EPYC 7281 Running CentOS 7.5 (does not work)

It appears to be an issue involving Intel MPI on AMD EPYC.

Some searching online seem to suggest that there are indeed some issues with this combination of MPI and hardware, however I am yet to find a solution. I will update here if I make any progress.

jchtheron commented 1 year ago

I have been able to run the case successfully with an older version of Intel MPI:

Intel MPI Library 2018 Update 3 for Linux* OS

Version: 2018.3.222

This particular install come pre-packaged with Ansys CFX 2020.

Unfortunately, I have not been able to make much progress with the Intel MPI that came with FDS.

mcgratta commented 1 year ago

If this is an issue of library/version/manufacturer incompatibilities, I am not going to be much help either. Usually, when someone has an OS or hardware that doesn't work with our release executable, we recommend that they compile themselves.

jchtheron commented 1 year ago

Yes, understandable.

I am happy for you to close the issue - I will follow up with Intel/AMD if possible or compile if I run into further issues.

mcgratta commented 1 year ago

OK, thanks. If you notice other troubles with AMD chips, let us know. It is still very difficult to know if functionality issues can be traced back to the actual chip.

firemodels / fds

[FDS 6.7.9][CentOS7] hangs indefinitely when running in parallel #11842

The Issue

Supplementary information