firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
670 stars 626 forks source link

FDS+EVAC won't run using MPI in LINUX #2088

Closed gforney closed 9 years ago

gforney commented 9 years ago

FDS Version: 6.0.0 

I am trying to run FDS+EVAC using MPI in LINUX but it just won't run, a segmentation
error occurs and gives me the following reason: 

"mpiexec has exited due to process rank 6 with PID 7268 on
node pn002.int exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here). "

However, when I don't use MPI, it runs perfectly but it takes too long.

I would like to know if anybody else has had this problem.

Thanks,

Andres

Original issue reported on code.google.com by andres.panagioto on 2014-04-02 12:14:47


gforney commented 9 years ago
The case runs for me on our linux cluster at NIST. I am using 7 processes (threads).
FDS processes each of the 6 meshes on dedicated cores, and it combines the 4 evac meshes
on one.

Timo -- do you see any problems?

Original issue reported on code.google.com by mcgratta on 2014-04-02 14:29:30

gforney commented 9 years ago
Well, I try it in our Linux cluster.

Timo

Original issue reported on code.google.com by tkorhon1 on 2014-04-03 11:44:11

gforney commented 9 years ago
I tried the fds 6.0.1 version (the same SVN that is on the
downloads page or at least used to be for the Windows. I.e.,
the last Windows version that I downloaded):

 Compilation Date : Tue, 26 Nov 2013
 Current Date     : April  3, 2014  14:47:48
 Version: FDS 6.0.1; MPI Enabled; OpenMP Disabled
 SVN Revision No. : 17534

This runs in our Linux cluster, compiled using Intel compiler.
Take the evacuation part away (at MISC-namelist, add NO_EVACUATION=.TRUE.)
Try the fire case with MPI.

Process   3 of   6 is running on compute-1-15.local
Process   0 of   6 is running on compute-1-15.local
Process   2 of   6 is running on compute-1-15.local
Process   6 of   6 is running on compute-1-15.local
Process   1 of   6 is running on compute-1-15.local
Process   4 of   6 is running on compute-1-15.local
Process   5 of   6 is running on compute-1-15.local
Mesh   1 is assigned to Process   0
Mesh   2 is assigned to Process   1
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
Mesh   3 is assigned to Process   2
Mesh   4 is assigned to Process   3
Mesh   5 is assigned to Process   4
Mesh   6 is assigned to Process   5
Mesh   7 is assigned to Process   6
Mesh   8 is assigned to Process   6
Mesh   9 is assigned to Process   6
Mesh  10 is assigned to Process   6
Mesh  11 is assigned to Process   6
Mesh  12 is assigned to Process   6
Mesh  13 is assigned to Process   6
Mesh  14 is assigned to Process   6
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields
 EVAC: Emesh     1 EVAC_MESH_1 has     1 door flow fields
 EVAC: Emesh     2 EVAC_MESH_2 has     1 door flow fields
 EVAC: Emesh     3 EVAC_MESH_3 has     1 door flow fields
 EVAC: Emesh     4 EVAC_MESH_4 has     1 door flow fields

 Fire Dynamics Simulator

 Compilation Date : Tue, 26 Nov 2013
 Current Date     : April  3, 2014  14:47:48

 Version: FDS 6.0.1; MPI Enabled; OpenMP Disabled
 SVN Revision No. : 17534

 Job TITLE        : thesis
 Job ID string    : fire1

 Time Step:    -49,    Simulation Time:      0.00 s
 Time Step:      1,    Simulation Time:      0.10 s
 Time Step:      2,    Simulation Time:      0.20 s
 Time Step:      3,    Simulation Time:      0.30 s
 Time Step:      4,    Simulation Time:      0.40 s
 Time Step:      5,    Simulation Time:      0.50 s
 Time Step:      6,    Simulation Time:      0.60 s

So, it is running also on our Linux cluster.

Timo

Original issue reported on code.google.com by tkorhon1 on 2014-04-03 11:58:20

gforney commented 9 years ago
Thank you very much for your comments.

The fire part runs just fine. I have been doing some research and it seems to be a
problem with the stack size (or an alike issue) of the cluster here in Lund. 

Thanks for the help!

Original issue reported on code.google.com by andres.panagioto on 2014-04-03 12:17:29

gforney commented 9 years ago
I'll close this issue. Seems to be a computer problem, not a
source code problem. I set the status to "UnVerified", i.e.,
not able to reproduce same error (or any error).

TimoK

Original issue reported on code.google.com by tkorhon1 on 2014-05-09 07:28:36