firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
674 stars 627 forks source link

Can't run simulation on Ubuntu #13614

Closed quantumfds closed 3 weeks ago

quantumfds commented 1 month ago

Hello,

I use Ubuntu aside Windows to run our FDS simulations. Unfortunately, after starting the simulation it crashes leaving the following message:

2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Starting FDS ... 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : MPI Process 0 started on q03 2024-10-23 21:08:19 : MPI Process 2 started on qmaster2 2024-10-23 21:08:19 : MPI Process 1 started on q03 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Reading FDS input file ... 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : WARNING: SPEC SFPE POLYURETHANE_GM37_fuel is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Fire Dynamics Simulator 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Current Date : October 23, 2024 21:08:19 2024-10-23 21:08:19 : Revision : FDS-6.9.0-0-g6339569-release 2024-10-23 21:08:19 : Revision Date : Wed Mar 20 13:59:17 2024 -0400 2024-10-23 21:08:19 : Compiler : Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.7.1 Build 20221019_000000 2024-10-23 21:08:19 : Compilation Date : Mar 21, 2024 07:58:03 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Number of MPI Processes: 3 2024-10-23 21:08:19 : Number of OpenMP Threads: 8 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : MPI version: 3.1 2024-10-23 21:08:19 : MPI library version: Intel(R) MPI Library 2021.6 for Linux* OS 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Job TITLE : 2024-10-23 21:08:19 : Job ID string : halle 2024-10-23 21:08:19 : 2024-10-23 21:08:36 : Time Step: 1, Simulation Time: 0.14 s 2024-10-23 21:08:38 : forrtl: severe (174): SIGSEGV, segmentation fault occurred 2024-10-23 21:08:38 : Image PC Routine Line Source
2024-10-23 21:08:38 : libc.so.6 000078B6C1A42520 Unknown Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 00000000073A5394 Unknown Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 000000000717EAE5 Unknown Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 000000000040BA1D Unknown Unknown Unknown 2024-10-23 21:08:38 : libc.so.6 000078B6C1A29D90 Unknown Unknown Unknown 2024-10-23 21:08:38 : libc.so.6 000078B6C1A29E40 __libc_start_main Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 000000000040B936 Unknown Unknown Unknown

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 84296 RUNNING AT q03 = KILLED BY SIGNAL: 9 (Killed)

How to solve this? I already try with ulimit commands, but the problem is still there.

Thanks.

mcgratta commented 1 month ago

Do you mean that you are using Windows Subsystem for Linux or WSL? If so, are you trying to run the Windows or Linux version of FDS? Also, how are you trying to run the job? What command or script?

quantumfds commented 1 month ago

We use Windows to access the results only and Ubuntu to actually run the simulations (Linux version of FDS). The simulation is run with the command fdsrun.sh.

mcgratta commented 1 month ago

I am not familiar with fdsrun.sh. Where is this script located? Also, is the Ubuntu linux operating system installed on your Windows computer?

quantumfds commented 1 month ago

I will check out what is this script exactly, but I think it assigns different meshes to different machines in cluster. What I have discovered is that I am able to run FDS with one or more meshes with the script, however only with smaller number of cells (up 200k cells everything work fine, from 1.5 Million cells I get the same message as above). The cluster has only Ubuntu installed.

mcgratta commented 1 month ago

One thing to check -- do you have this parameter set in one of your start-up files, like .bashrc

ulimit -s unlimited

This sets the stacksize to unlimited, meaning it allows your job to use as much system memory as it needs. Also, do you really want to use 8 OpenMP threads? This might be a problem if your run script has not set things up properly. I would do this

export OMP_NUM_THREADS=1

at the command line before you run the job. Otherwise, you need to better understand how that fdsrun.sh works. Do a

which fdsrun.sh

or find the person who wrote it.

quantumfds commented 3 weeks ago

Hello

Thank you for providing help! Sorry for the delay. We played with OMP_NUM_THREADS with different values (8,2,1). The result was always the same. At the end we emptied the variable. Then FDS assumes obviously "16". ulimit was set properly: $ ulimit unlimited fdsrun.sh is a wrapper file that runs the fds application: mpirun -wdir /data/<projectpath>/halle2 -n 3 -prepend-timestamp -ppn 2 --host q03,qmaster2 fds_openmp /data/<projectpath>/halle2/test_halle_mesh_02.fds Where q03,qmaster2 are 2 cluster nodes with 32 CPU cores (16 phy with hyperthreading) per node. We have 32 GB of RAM. We monitored the computer ressources during initialization of the simulation. There was no significant increase in memory or CPU consumption before the process crashed. The only difference is that the cells are smaller (instead of 0.4 m we use 0.2 m now). The model stayed exactly the same.

mcgratta commented 3 weeks ago

I do not understand all the arguments in the mpirun command. Can you run a very small simple case with multiple meshes?

quantumfds commented 3 weeks ago

with simple cases it works perfectly (for instance the same model with bigger cells)

here are the parameter explanation we use for mpirun

-wdir

Change to the directory before the user’s program executes. See the "Current Working Directory" section for notes on relative paths. Note: If the -wdir option appears both on the command line and in an application context, the context will take precedence over the command line. Thus, if the path to the desired wdir is different on the backend nodes, then it must be specified as an absolute path that is correct for the backend node.

-n 3 tells mpirun the number of meshes in the fds file (grep -c "^&MESH" $PROJECT_FILE) -c, -n, --n, -np <#> Run this many copies of the program on the given nodes. This option indicates that the specified file is an executable program and not an application context. If no value is provided for the number of copies to execute (i.e., neither the "-np" nor its synonyms are provided on the command line), Open MPI will automatically execute a copy of the program on each process slot (see below for description of a "process slot"). This feature, however, can only be used in the SPMD model and will return an error (without beginning execution of the application) otherwise. -<>

-prepend-timestamp -timestamp-output Timestamp each line of output to stdout, stderr, and stddiag. for "show-progress"

Parameters we found here: https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php

mcgratta commented 3 weeks ago

If a smaller job runs and a larger job fails, that suggests that you have run out of memory. If you post the input file I can try it.

quantumfds commented 3 weeks ago

Thank you, here is the code:

&HEAD CHID='test_halle_mesh_02_6_meshs'/ &TIME T_END=1800.0/ &DUMP DT_RESTART=60.0, DT_SL3D=0.25/

&MESH ID='MESH-01', IJK=86,136,34, XB=-2.4,14.8,-81.2,-54.0,0.0,6.8/ &MESH ID='MESH-02', IJK=86,136,34, XB=-2.4,14.8,-54.0,-26.8,0.0,6.8/ &MESH ID='MESH-03', IJK=86,136,34, XB=-2.4,14.8,-26.8,0.4,0.0,6.8/ &MESH ID='MESH-04', IJK=86,136,34, XB=14.8,32.0,-81.2,-54.0,0.0,6.8/ &MESH ID='MESH-05', IJK=86,136,34, XB=14.8,32.0,-54.0,-26.8,0.0,6.8/ &MESH ID='MESH-06', IJK=86,136,34, XB=14.8,32.0,-26.8,0.4,0.0,6.8/

&SPEC ID='SFPE POLYURETHANE_GM37_fuel', FORMULA='C1.0H1.2O0.2N0.08'/

&REAC ID='SFPE POLYURETHANE_GM37', FYI='SFPE Handbook, 5th Edition, Tables A.38 and A.39', FUEL='SFPE POLYURETHANE_GM37_fuel', CO_YIELD=0.024, SOOT_YIELD=0.113, HEAT_OF_COMBUSTION=1.79E+4, RADIATIVE_FRACTION=0.514/

&PROP ID='Cleary Photoelectric P1', QUANTITY='CHAMBER OBSCURATION', ALPHA_E=1.8, BETA_E=-1.0, ALPHA_C=1.0, BETA_C=-0.8/ &CTRL ID='FREEZE_TIME', FUNCTION_TYPE='TIME_DELAY', DELAY=30.0, LATCH=.FALSE., INPUT_ID='or'/ &CTRL ID='or', FUNCTION_TYPE='ANY', INPUT_ID='T1','T2','T3','T4'/ &CTRL ID='invert', FUNCTION_TYPE='ALL', LATCH=.FALSE., INITIAL_STATE=.TRUE., INPUT_ID='TIMER->OUT'/ &DEVC ID='T1', QUANTITY='TEMPERATURE', XYZ=20.1,-65.1,5.2, SETPOINT=68.0/ &DEVC ID='T2', QUANTITY='TEMPERATURE', XYZ=24.1,-65.1,5.2, SETPOINT=68.0/ &DEVC ID='T3', QUANTITY='TEMPERATURE', XYZ=24.1,-68.7,5.2, SETPOINT=68.0/ &DEVC ID='T4', QUANTITY='TEMPERATURE', XYZ=20.1,-68.7,5.2, SETPOINT=68.0/ &DEVC ID='Time', QUANTITY='TIME', XYZ=0.0,0.0,0.0, NO_UPDATE_CTRL_ID='FREEZE_TIME'/ &DEVC ID='SD1', PROP_ID='Cleary Photoelectric P1', XYZ=19.7,-62.1,5.2/ &DEVC ID='SD2', PROP_ID='Cleary Photoelectric P1', XYZ=24.7,-62.1,5.2/ &DEVC ID='SD3', PROP_ID='Cleary Photoelectric P1', XYZ=19.7,-72.1,5.2/ &DEVC ID='SD4', PROP_ID='Cleary Photoelectric P1', XYZ=24.7,-72.1,5.2/ &DEVC ID='FLOW RDA', QUANTITY='V-VELOCITY', SPATIAL_STATISTIC='AREA INTEGRAL', XB=-2.0,7.017544E-3,-9.670043E-14,-9.670043E-14,0.402829,3.2/ &DEVC ID='PRESSURE DOOR', QUANTITY='PRESSURE', XYZ=0.4,-40.0,1.2/ &DEVC ID='TIMER->OUT', QUANTITY='TIME', XYZ=-2.4,-81.2,0.0, SETPOINT=1200.0/

&SURF ID='Concrete', RGB=255,203,101, TRANSPARENCY=0.498039/ &SURF ID='Fire', COLOR='RED', HRRPUA=2083.0, RAMP_Q='Fire_RAMP_Q', TMP_FRONT=300.0/ &RAMP ID='Fire_RAMP_Q', T=0.0, F=0.0, DEVC_ID='Time'/ &RAMP ID='Fire_RAMP_Q', T=10.0, F=0.0/ &RAMP ID='Fire_RAMP_Q', T=20.0, F=0.01/ &RAMP ID='Fire_RAMP_Q', T=30.0, F=0.01/ &RAMP ID='Fire_RAMP_Q', T=40.0, F=0.03/ &RAMP ID='Fire_RAMP_Q', T=50.0, F=0.04/ &RAMP ID='Fire_RAMP_Q', T=60.0, F=0.06/ &RAMP ID='Fire_RAMP_Q', T=70.0, F=0.08/ &RAMP ID='Fire_RAMP_Q', T=80.0, F=0.1/ &RAMP ID='Fire_RAMP_Q', T=90.0, F=0.13/ &RAMP ID='Fire_RAMP_Q', T=100.0, F=0.16/ &RAMP ID='Fire_RAMP_Q', T=110.0, F=0.19/ &RAMP ID='Fire_RAMP_Q', T=120.0, F=0.23/ &RAMP ID='Fire_RAMP_Q', T=130.0, F=0.26/ &RAMP ID='Fire_RAMP_Q', T=140.0, F=0.31/ &RAMP ID='Fire_RAMP_Q', T=150.0, F=0.35/ &RAMP ID='Fire_RAMP_Q', T=160.0, F=0.4/ &RAMP ID='Fire_RAMP_Q', T=170.0, F=0.45/ &RAMP ID='Fire_RAMP_Q', T=180.0, F=0.51/ &RAMP ID='Fire_RAMP_Q', T=190.0, F=0.57/ &RAMP ID='Fire_RAMP_Q', T=200.0, F=0.63/ &RAMP ID='Fire_RAMP_Q', T=210.0, F=0.69/ &RAMP ID='Fire_RAMP_Q', T=220.0, F=0.76/ &RAMP ID='Fire_RAMP_Q', T=230.0, F=0.83/ &RAMP ID='Fire_RAMP_Q', T=240.0, F=0.9/ &RAMP ID='Fire_RAMP_Q', T=250.0, F=0.98/ &RAMP ID='Fire_RAMP_Q', T=300.0, F=1.0/ &RAMP ID='Fire_RAMP_Q', T=1800.0, F=1.0/ &SURF ID='Supply RDA 30k', RGB=26,0,255, HEAT_TRANSFER_COEFFICIENT=0.0, VOLUME_FLOW=-8.3333, TAU_V=-60.0/

&OBST ID='Obstruction', XB=7.017544E-3,32.0,-80.4,-9.670043E-14,5.6,6.0, SURF_ID='Concrete'/ &OBST ID='Brand', XB=21.7,22.9,-67.5,-66.3,0.4,0.8, SURF_IDS='Fire','Concrete','Concrete'/ &OBST ID='Obstruction', XB=-2.4,32.0,-80.4,0.4,0.0,0.4, SURF_ID='Concrete'/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,1.6,2.0/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,2.4,2.8/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,3.2,3.6/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,4.0,4.4/ &OBST ID='Obstruction', XB=-2.4,-1.110223E-16,-80.4,0.4,3.2,3.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.6,-0.4,-80.4,-80.0,1.2,2.4, SURF_ID='Concrete', CTRL_ID='invert'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-41.2,-38.8,0.4,2.4, SURF_ID='Concrete', CTRL_ID='invert'/ &OBST ID='Obstruction', XB=-2.4,-2.0,-80.0,-54.0,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-1.6,-80.4,-80.0,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.6,-0.4,-80.4,-80.0,0.0,1.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.6,-0.4,-80.4,-80.0,2.4,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-0.4,-1.665335E-16,-80.4,-80.0,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-80.0,-54.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,14.8,-80.4,-80.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-2.0,-54.0,-26.8,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-41.2,-38.8,0.0,0.4, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-54.0,-41.2,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-38.8,-26.8,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-41.2,-38.8,2.4,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-2.0,-26.8,6.131207E-13,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-1.665335E-16,6.131207E-13,0.4,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-26.8,6.131207E-13,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,14.8,6.131207E-13,0.4,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=14.8,32.0,-80.4,-80.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=31.6,32.0,-80.0,-54.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=31.6,32.0,-54.0,-26.8,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=14.8,32.0,6.131207E-13,0.4,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=31.6,32.0,-26.8,6.131207E-13,0.0,5.6, SURF_ID='Concrete'/

&HOLE ID='Supply for FDS', XB=18.0,19.2,-80.8,-79.6,0.4,2.4/

&VENT ID='Mesh Vent: MESH-01 [XMIN]', SURF_ID='OPEN', XB=-2.4,-2.4,-81.2,-54.0,0.0,6.8/ &VENT ID='Mesh Vent: MESH-01 [YMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-81.2,-81.2,0.0,6.8/ &VENT ID='Mesh Vent: MESH-01 [ZMAX]', SURF_ID='OPEN', XB=-2.4,14.8,-81.2,-54.0,6.8,6.8/ &VENT ID='Mesh Vent: MESH-01 [ZMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-81.2,-54.0,0.0,0.0/ &VENT ID='Mesh Vent: MESH-02 [XMIN]', SURF_ID='OPEN', XB=-2.4,-2.4,-54.0,-26.8,0.0,6.8/ &VENT ID='Mesh Vent: MESH-02 [ZMAX]', SURF_ID='OPEN', XB=-2.4,14.8,-54.0,-26.8,6.8,6.8/ &VENT ID='Mesh Vent: MESH-02 [ZMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-54.0,-26.8,0.0,0.0/ &VENT ID='Mesh Vent: MESH-03 [XMIN]', SURF_ID='OPEN', XB=-2.4,-2.4,-26.8,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-03 [YMAX]', SURF_ID='OPEN', XB=-2.4,14.8,0.4,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-03 [ZMAX]', SURF_ID='OPEN', XB=-2.4,14.8,-26.8,0.4,6.8,6.8/ &VENT ID='Mesh Vent: MESH-03 [ZMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-26.8,0.4,0.0,0.0/ &VENT ID='Mesh Vent: MESH-04 [XMAX]', SURF_ID='OPEN', XB=32.0,32.0,-81.2,-54.0,0.0,6.8/ &VENT ID='Mesh Vent: MESH-04 [YMIN]', SURF_ID='OPEN', XB=14.8,32.0,-81.2,-81.2,0.0,6.8/ &VENT ID='Mesh Vent: MESH-04 [ZMAX]', SURF_ID='OPEN', XB=14.8,32.0,-81.2,-54.0,6.8,6.8/ &VENT ID='Mesh Vent: MESH-04 [ZMIN]', SURF_ID='OPEN', XB=14.8,32.0,-81.2,-54.0,0.0,0.0/ &VENT ID='Mesh Vent: MESH-05 [XMAX]', SURF_ID='OPEN', XB=32.0,32.0,-54.0,-26.8,0.0,6.8/ &VENT ID='Mesh Vent: MESH-05 [ZMAX]', SURF_ID='OPEN', XB=14.8,32.0,-54.0,-26.8,6.8,6.8/ &VENT ID='Mesh Vent: MESH-05 [ZMIN]', SURF_ID='OPEN', XB=14.8,32.0,-54.0,-26.8,0.0,0.0/ &VENT ID='Mesh Vent: MESH-06 [XMAX]', SURF_ID='OPEN', XB=32.0,32.0,-26.8,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-06 [YMAX]', SURF_ID='OPEN', XB=14.8,32.0,0.4,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-06 [ZMAX]', SURF_ID='OPEN', XB=14.8,32.0,-26.8,0.4,6.8,6.8/ &VENT ID='Mesh Vent: MESH-06 [ZMIN]', SURF_ID='OPEN', XB=14.8,32.0,-26.8,0.4,0.0,0.0/ &VENT ID='RDA 30k', SURF_ID='Supply RDA 30k', XB=-2.0,-1.110223E-16,-9.670043E-14,-9.670043E-14,0.402829,3.2, DEVC_ID='TIMER->OUT'/

&ISOF QUANTITY='EXTINCTION COEFFICIENT', VALUE=0.2/ &ISOF QUANTITY='TEMPERATURE', VALUE=50.0/

&SLCF QUANTITY='VISIBILITY', ID='0', PBZ=2.4/ &SLCF QUANTITY='VELOCITY', ID='0', PBZ=2.4/ &SLCF QUANTITY='VELOCITY', ID='0', PBZ=5.2/ &SLCF QUANTITY='TEMPERATURE', ID='1', PBZ=2.4/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='2', PBZ=-0.814762/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='2', PBZ=2.4/ &SLCF QUANTITY='TEMPERATURE', ID='3', PBY=-40.6/ &SLCF QUANTITY='TEMPERATURE', ID='4', PBX=16.8/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='4', PBX=16.8/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='3', PBY=-40.6/ &SLCF QUANTITY='PRESSURE', ID='4', PBZ=1.8/

&TAIL /

mcgratta commented 3 weeks ago

This case runs on my linux cluster using the latest source code. Each of the 6 processes uses approximately 1.5 GB RAM.

  1. Set ulimit -s unlimited
  2. Check how much RAM you have
  3. Use the latest release version of FDS.
mcgratta commented 3 weeks ago

I will close the issue. If the problem persists, reopen it.