Closed quantumfds closed 3 weeks ago
Do you mean that you are using Windows Subsystem for Linux or WSL? If so, are you trying to run the Windows or Linux version of FDS? Also, how are you trying to run the job? What command or script?
We use Windows to access the results only and Ubuntu to actually run the simulations (Linux version of FDS). The simulation is run with the command fdsrun.sh.
I am not familiar with fdsrun.sh
. Where is this script located? Also, is the Ubuntu linux operating system installed on your Windows computer?
I will check out what is this script exactly, but I think it assigns different meshes to different machines in cluster. What I have discovered is that I am able to run FDS with one or more meshes with the script, however only with smaller number of cells (up 200k cells everything work fine, from 1.5 Million cells I get the same message as above). The cluster has only Ubuntu installed.
One thing to check -- do you have this parameter set in one of your start-up files, like .bashrc
ulimit -s unlimited
This sets the stacksize to unlimited, meaning it allows your job to use as much system memory as it needs. Also, do you really want to use 8 OpenMP threads? This might be a problem if your run script has not set things up properly. I would do this
export OMP_NUM_THREADS=1
at the command line before you run the job. Otherwise, you need to better understand how that fdsrun.sh
works. Do a
which fdsrun.sh
or find the person who wrote it.
Hello
Thank you for providing help! Sorry for the delay.
We played with OMP_NUM_THREADS with different values (8,2,1). The result was always the same.
At the end we emptied the variable. Then FDS assumes obviously "16".
ulimit was set properly:
$ ulimit unlimited
fdsrun.sh is a wrapper file that runs the fds application:
mpirun -wdir /data/<projectpath>/halle2 -n 3 -prepend-timestamp -ppn 2 --host q03,qmaster2 fds_openmp /data/<projectpath>/halle2/test_halle_mesh_02.fds
Where q03,qmaster2 are 2 cluster nodes with 32 CPU cores (16 phy with hyperthreading) per node.
We have 32 GB of RAM.
We monitored the computer ressources during initialization of the simulation. There was no significant increase in memory or CPU consumption before the process crashed.
The only difference is that the cells are smaller (instead of 0.4 m we use 0.2 m now). The model stayed exactly the same.
I do not understand all the arguments in the mpirun command. Can you run a very small simple case with multiple meshes?
with simple cases it works perfectly (for instance the same model with bigger cells)
here are the parameter explanation we use for mpirun
-wdir
-n 3 tells mpirun the number of meshes in the fds file (grep -c "^&MESH" $PROJECT_FILE) -c, -n, --n, -np <#> Run this many copies of the program on the given nodes. This option indicates that the specified file is an executable program and not an application context. If no value is provided for the number of copies to execute (i.e., neither the "-np" nor its synonyms are provided on the command line), Open MPI will automatically execute a copy of the program on each process slot (see below for description of a "process slot"). This feature, however, can only be used in the SPMD model and will return an error (without beginning execution of the application) otherwise. -<>
-prepend-timestamp -timestamp-output Timestamp each line of output to stdout, stderr, and stddiag. for "show-progress"
Parameters we found here: https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php
If a smaller job runs and a larger job fails, that suggests that you have run out of memory. If you post the input file I can try it.
Thank you, here is the code:
&HEAD CHID='test_halle_mesh_02_6_meshs'/ &TIME T_END=1800.0/ &DUMP DT_RESTART=60.0, DT_SL3D=0.25/
&MESH ID='MESH-01', IJK=86,136,34, XB=-2.4,14.8,-81.2,-54.0,0.0,6.8/ &MESH ID='MESH-02', IJK=86,136,34, XB=-2.4,14.8,-54.0,-26.8,0.0,6.8/ &MESH ID='MESH-03', IJK=86,136,34, XB=-2.4,14.8,-26.8,0.4,0.0,6.8/ &MESH ID='MESH-04', IJK=86,136,34, XB=14.8,32.0,-81.2,-54.0,0.0,6.8/ &MESH ID='MESH-05', IJK=86,136,34, XB=14.8,32.0,-54.0,-26.8,0.0,6.8/ &MESH ID='MESH-06', IJK=86,136,34, XB=14.8,32.0,-26.8,0.4,0.0,6.8/
&SPEC ID='SFPE POLYURETHANE_GM37_fuel', FORMULA='C1.0H1.2O0.2N0.08'/
&REAC ID='SFPE POLYURETHANE_GM37', FYI='SFPE Handbook, 5th Edition, Tables A.38 and A.39', FUEL='SFPE POLYURETHANE_GM37_fuel', CO_YIELD=0.024, SOOT_YIELD=0.113, HEAT_OF_COMBUSTION=1.79E+4, RADIATIVE_FRACTION=0.514/
&PROP ID='Cleary Photoelectric P1', QUANTITY='CHAMBER OBSCURATION', ALPHA_E=1.8, BETA_E=-1.0, ALPHA_C=1.0, BETA_C=-0.8/ &CTRL ID='FREEZE_TIME', FUNCTION_TYPE='TIME_DELAY', DELAY=30.0, LATCH=.FALSE., INPUT_ID='or'/ &CTRL ID='or', FUNCTION_TYPE='ANY', INPUT_ID='T1','T2','T3','T4'/ &CTRL ID='invert', FUNCTION_TYPE='ALL', LATCH=.FALSE., INITIAL_STATE=.TRUE., INPUT_ID='TIMER->OUT'/ &DEVC ID='T1', QUANTITY='TEMPERATURE', XYZ=20.1,-65.1,5.2, SETPOINT=68.0/ &DEVC ID='T2', QUANTITY='TEMPERATURE', XYZ=24.1,-65.1,5.2, SETPOINT=68.0/ &DEVC ID='T3', QUANTITY='TEMPERATURE', XYZ=24.1,-68.7,5.2, SETPOINT=68.0/ &DEVC ID='T4', QUANTITY='TEMPERATURE', XYZ=20.1,-68.7,5.2, SETPOINT=68.0/ &DEVC ID='Time', QUANTITY='TIME', XYZ=0.0,0.0,0.0, NO_UPDATE_CTRL_ID='FREEZE_TIME'/ &DEVC ID='SD1', PROP_ID='Cleary Photoelectric P1', XYZ=19.7,-62.1,5.2/ &DEVC ID='SD2', PROP_ID='Cleary Photoelectric P1', XYZ=24.7,-62.1,5.2/ &DEVC ID='SD3', PROP_ID='Cleary Photoelectric P1', XYZ=19.7,-72.1,5.2/ &DEVC ID='SD4', PROP_ID='Cleary Photoelectric P1', XYZ=24.7,-72.1,5.2/ &DEVC ID='FLOW RDA', QUANTITY='V-VELOCITY', SPATIAL_STATISTIC='AREA INTEGRAL', XB=-2.0,7.017544E-3,-9.670043E-14,-9.670043E-14,0.402829,3.2/ &DEVC ID='PRESSURE DOOR', QUANTITY='PRESSURE', XYZ=0.4,-40.0,1.2/ &DEVC ID='TIMER->OUT', QUANTITY='TIME', XYZ=-2.4,-81.2,0.0, SETPOINT=1200.0/
&SURF ID='Concrete', RGB=255,203,101, TRANSPARENCY=0.498039/ &SURF ID='Fire', COLOR='RED', HRRPUA=2083.0, RAMP_Q='Fire_RAMP_Q', TMP_FRONT=300.0/ &RAMP ID='Fire_RAMP_Q', T=0.0, F=0.0, DEVC_ID='Time'/ &RAMP ID='Fire_RAMP_Q', T=10.0, F=0.0/ &RAMP ID='Fire_RAMP_Q', T=20.0, F=0.01/ &RAMP ID='Fire_RAMP_Q', T=30.0, F=0.01/ &RAMP ID='Fire_RAMP_Q', T=40.0, F=0.03/ &RAMP ID='Fire_RAMP_Q', T=50.0, F=0.04/ &RAMP ID='Fire_RAMP_Q', T=60.0, F=0.06/ &RAMP ID='Fire_RAMP_Q', T=70.0, F=0.08/ &RAMP ID='Fire_RAMP_Q', T=80.0, F=0.1/ &RAMP ID='Fire_RAMP_Q', T=90.0, F=0.13/ &RAMP ID='Fire_RAMP_Q', T=100.0, F=0.16/ &RAMP ID='Fire_RAMP_Q', T=110.0, F=0.19/ &RAMP ID='Fire_RAMP_Q', T=120.0, F=0.23/ &RAMP ID='Fire_RAMP_Q', T=130.0, F=0.26/ &RAMP ID='Fire_RAMP_Q', T=140.0, F=0.31/ &RAMP ID='Fire_RAMP_Q', T=150.0, F=0.35/ &RAMP ID='Fire_RAMP_Q', T=160.0, F=0.4/ &RAMP ID='Fire_RAMP_Q', T=170.0, F=0.45/ &RAMP ID='Fire_RAMP_Q', T=180.0, F=0.51/ &RAMP ID='Fire_RAMP_Q', T=190.0, F=0.57/ &RAMP ID='Fire_RAMP_Q', T=200.0, F=0.63/ &RAMP ID='Fire_RAMP_Q', T=210.0, F=0.69/ &RAMP ID='Fire_RAMP_Q', T=220.0, F=0.76/ &RAMP ID='Fire_RAMP_Q', T=230.0, F=0.83/ &RAMP ID='Fire_RAMP_Q', T=240.0, F=0.9/ &RAMP ID='Fire_RAMP_Q', T=250.0, F=0.98/ &RAMP ID='Fire_RAMP_Q', T=300.0, F=1.0/ &RAMP ID='Fire_RAMP_Q', T=1800.0, F=1.0/ &SURF ID='Supply RDA 30k', RGB=26,0,255, HEAT_TRANSFER_COEFFICIENT=0.0, VOLUME_FLOW=-8.3333, TAU_V=-60.0/
&OBST ID='Obstruction', XB=7.017544E-3,32.0,-80.4,-9.670043E-14,5.6,6.0, SURF_ID='Concrete'/ &OBST ID='Brand', XB=21.7,22.9,-67.5,-66.3,0.4,0.8, SURF_IDS='Fire','Concrete','Concrete'/ &OBST ID='Obstruction', XB=-2.4,32.0,-80.4,0.4,0.0,0.4, SURF_ID='Concrete'/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,1.6,2.0/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,2.4,2.8/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,3.2,3.6/ &OBST ID='Regal', XB=21.3,23.3,-69.5,-63.9,4.0,4.4/ &OBST ID='Obstruction', XB=-2.4,-1.110223E-16,-80.4,0.4,3.2,3.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.6,-0.4,-80.4,-80.0,1.2,2.4, SURF_ID='Concrete', CTRL_ID='invert'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-41.2,-38.8,0.4,2.4, SURF_ID='Concrete', CTRL_ID='invert'/ &OBST ID='Obstruction', XB=-2.4,-2.0,-80.0,-54.0,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-1.6,-80.4,-80.0,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.6,-0.4,-80.4,-80.0,0.0,1.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.6,-0.4,-80.4,-80.0,2.4,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-0.4,-1.665335E-16,-80.4,-80.0,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-80.0,-54.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,14.8,-80.4,-80.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-2.0,-54.0,-26.8,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-41.2,-38.8,0.0,0.4, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-54.0,-41.2,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-38.8,-26.8,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-41.2,-38.8,2.4,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-2.0,-26.8,6.131207E-13,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-2.4,-1.665335E-16,6.131207E-13,0.4,0.0,3.2, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,0.4,-26.8,6.131207E-13,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=-1.665335E-16,14.8,6.131207E-13,0.4,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=14.8,32.0,-80.4,-80.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=31.6,32.0,-80.0,-54.0,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=31.6,32.0,-54.0,-26.8,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=14.8,32.0,6.131207E-13,0.4,0.0,5.6, SURF_ID='Concrete'/ &OBST ID='Obstruction', XB=31.6,32.0,-26.8,6.131207E-13,0.0,5.6, SURF_ID='Concrete'/
&HOLE ID='Supply for FDS', XB=18.0,19.2,-80.8,-79.6,0.4,2.4/
&VENT ID='Mesh Vent: MESH-01 [XMIN]', SURF_ID='OPEN', XB=-2.4,-2.4,-81.2,-54.0,0.0,6.8/ &VENT ID='Mesh Vent: MESH-01 [YMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-81.2,-81.2,0.0,6.8/ &VENT ID='Mesh Vent: MESH-01 [ZMAX]', SURF_ID='OPEN', XB=-2.4,14.8,-81.2,-54.0,6.8,6.8/ &VENT ID='Mesh Vent: MESH-01 [ZMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-81.2,-54.0,0.0,0.0/ &VENT ID='Mesh Vent: MESH-02 [XMIN]', SURF_ID='OPEN', XB=-2.4,-2.4,-54.0,-26.8,0.0,6.8/ &VENT ID='Mesh Vent: MESH-02 [ZMAX]', SURF_ID='OPEN', XB=-2.4,14.8,-54.0,-26.8,6.8,6.8/ &VENT ID='Mesh Vent: MESH-02 [ZMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-54.0,-26.8,0.0,0.0/ &VENT ID='Mesh Vent: MESH-03 [XMIN]', SURF_ID='OPEN', XB=-2.4,-2.4,-26.8,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-03 [YMAX]', SURF_ID='OPEN', XB=-2.4,14.8,0.4,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-03 [ZMAX]', SURF_ID='OPEN', XB=-2.4,14.8,-26.8,0.4,6.8,6.8/ &VENT ID='Mesh Vent: MESH-03 [ZMIN]', SURF_ID='OPEN', XB=-2.4,14.8,-26.8,0.4,0.0,0.0/ &VENT ID='Mesh Vent: MESH-04 [XMAX]', SURF_ID='OPEN', XB=32.0,32.0,-81.2,-54.0,0.0,6.8/ &VENT ID='Mesh Vent: MESH-04 [YMIN]', SURF_ID='OPEN', XB=14.8,32.0,-81.2,-81.2,0.0,6.8/ &VENT ID='Mesh Vent: MESH-04 [ZMAX]', SURF_ID='OPEN', XB=14.8,32.0,-81.2,-54.0,6.8,6.8/ &VENT ID='Mesh Vent: MESH-04 [ZMIN]', SURF_ID='OPEN', XB=14.8,32.0,-81.2,-54.0,0.0,0.0/ &VENT ID='Mesh Vent: MESH-05 [XMAX]', SURF_ID='OPEN', XB=32.0,32.0,-54.0,-26.8,0.0,6.8/ &VENT ID='Mesh Vent: MESH-05 [ZMAX]', SURF_ID='OPEN', XB=14.8,32.0,-54.0,-26.8,6.8,6.8/ &VENT ID='Mesh Vent: MESH-05 [ZMIN]', SURF_ID='OPEN', XB=14.8,32.0,-54.0,-26.8,0.0,0.0/ &VENT ID='Mesh Vent: MESH-06 [XMAX]', SURF_ID='OPEN', XB=32.0,32.0,-26.8,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-06 [YMAX]', SURF_ID='OPEN', XB=14.8,32.0,0.4,0.4,0.0,6.8/ &VENT ID='Mesh Vent: MESH-06 [ZMAX]', SURF_ID='OPEN', XB=14.8,32.0,-26.8,0.4,6.8,6.8/ &VENT ID='Mesh Vent: MESH-06 [ZMIN]', SURF_ID='OPEN', XB=14.8,32.0,-26.8,0.4,0.0,0.0/ &VENT ID='RDA 30k', SURF_ID='Supply RDA 30k', XB=-2.0,-1.110223E-16,-9.670043E-14,-9.670043E-14,0.402829,3.2, DEVC_ID='TIMER->OUT'/
&ISOF QUANTITY='EXTINCTION COEFFICIENT', VALUE=0.2/ &ISOF QUANTITY='TEMPERATURE', VALUE=50.0/
&SLCF QUANTITY='VISIBILITY', ID='0', PBZ=2.4/ &SLCF QUANTITY='VELOCITY', ID='0', PBZ=2.4/ &SLCF QUANTITY='VELOCITY', ID='0', PBZ=5.2/ &SLCF QUANTITY='TEMPERATURE', ID='1', PBZ=2.4/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='2', PBZ=-0.814762/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='2', PBZ=2.4/ &SLCF QUANTITY='TEMPERATURE', ID='3', PBY=-40.6/ &SLCF QUANTITY='TEMPERATURE', ID='4', PBX=16.8/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='4', PBX=16.8/ &SLCF QUANTITY='EXTINCTION COEFFICIENT', ID='3', PBY=-40.6/ &SLCF QUANTITY='PRESSURE', ID='4', PBZ=1.8/
&TAIL /
This case runs on my linux cluster using the latest source code. Each of the 6 processes uses approximately 1.5 GB RAM.
ulimit -s unlimited
I will close the issue. If the problem persists, reopen it.
Hello,
I use Ubuntu aside Windows to run our FDS simulations. Unfortunately, after starting the simulation it crashes leaving the following message:
2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Starting FDS ... 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : MPI Process 0 started on q03 2024-10-23 21:08:19 : MPI Process 2 started on qmaster2 2024-10-23 21:08:19 : MPI Process 1 started on q03 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Reading FDS input file ... 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : WARNING: SPEC SFPE POLYURETHANE_GM37_fuel is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Fire Dynamics Simulator 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Current Date : October 23, 2024 21:08:19 2024-10-23 21:08:19 : Revision : FDS-6.9.0-0-g6339569-release 2024-10-23 21:08:19 : Revision Date : Wed Mar 20 13:59:17 2024 -0400 2024-10-23 21:08:19 : Compiler : Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.7.1 Build 20221019_000000 2024-10-23 21:08:19 : Compilation Date : Mar 21, 2024 07:58:03 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Number of MPI Processes: 3 2024-10-23 21:08:19 : Number of OpenMP Threads: 8 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : MPI version: 3.1 2024-10-23 21:08:19 : MPI library version: Intel(R) MPI Library 2021.6 for Linux* OS 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : 2024-10-23 21:08:19 : Job TITLE : 2024-10-23 21:08:19 : Job ID string : halle 2024-10-23 21:08:19 : 2024-10-23 21:08:36 : Time Step: 1, Simulation Time: 0.14 s 2024-10-23 21:08:38 : forrtl: severe (174): SIGSEGV, segmentation fault occurred 2024-10-23 21:08:38 : Image PC Routine Line Source
2024-10-23 21:08:38 : libc.so.6 000078B6C1A42520 Unknown Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 00000000073A5394 Unknown Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 000000000717EAE5 Unknown Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 000000000040BA1D Unknown Unknown Unknown 2024-10-23 21:08:38 : libc.so.6 000078B6C1A29D90 Unknown Unknown Unknown 2024-10-23 21:08:38 : libc.so.6 000078B6C1A29E40 __libc_start_main Unknown Unknown 2024-10-23 21:08:38 : fds_openmp 000000000040B936 Unknown Unknown Unknown
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 84296 RUNNING AT q03 = KILLED BY SIGNAL: 9 (Killed)
How to solve this? I already try with ulimit commands, but the problem is still there.
Thanks.