Closed gforney closed 9 years ago
Ok, I'll take a look of this. Might be that I have not updated
the evaucation .fds files lately. I just got the latest SVN
source code today and ran it using my simple test case, where
I check that:
serial fds fire = serial fds+evac (for fire outputs)
serial fds+evac = mpi fds+evac (for fire + evac outputs)
serial fds+evac = openmp fds+evac (serial fds+evac means
that I force just one core)
Wbr,
Timo
Original issue reported on code.google.com by tkorhon1
on 2014-12-18 15:01:01
Well, I updated the input files for the evacuation verification cases.
Now they should work for serial SVN Revision No. : 21192. Next I'll
check the parallel version.
Ok, now I see the problem. I'll fix this. Something has been
changed lately in the source code...
Timo
Original issue reported on code.google.com by tkorhon1
on 2014-12-19 14:15:32
I made the changes in read.f90 (Revision: 21212).
Variables NMESHES_EVAC, NMESHES_FIRE, NM_EVAC are added to READ_MESH.
These are counters that I need now. I can not yet use EVACUATION_ONLY
nor EVACUATION_GRID arrays because they are not yet having correct
values. They are initialized properly for the meshes later (well,
at least for some additional evacuation meshes, like STRS meshes etc).
Changes to two checks needed to be done, first was:
! Stop the calculation if the number of MPI processes is greater than the number of
meshes
IF (NO_EVACUATION) THEN
IF (NMESHES<NUMPROCS) THEN
CALL MPI_FINALIZE(IERR)
WRITE(MESSAGE,'(A,I3,A,I3)') 'ERROR: The number of MPI processes, ',NUMPROCS,',
exceeds the number of meshes, ',NMESHES
CALL SHUTDOWN(MESSAGE) ; RETURN
ENDIF
ELSE
IF(NMESHES_FIRE+1<NUMPROCS) THEN
CALL MPI_FINALIZE(IERR)
WRITE(MESSAGE,'(A,I3,A,I3)') 'ERROR: The number of MPI processes, ',NUMPROCS,&
', exceeds the number of fire meshes + 1, ',NMESHES_FIRE+1
CALL SHUTDOWN(MESSAGE) ; RETURN
ENDIF
ENDIF
Here I was using NO_EVACUATION to be sure that the plain fire
calculation is not broken.
Second change was:
! Check for bad mesh ordering if MPI_PROCESS used
IF (USE_MPI) THEN
DO NM=1,NMESHES
IF (NM==1) CYCLE
IF (EVACUATION_ONLY(NM)) CYCLE
IF (PROCESS(NM) < PROCESS(NM-1)) THEN
WRITE(MESSAGE,'(A,I3,A,I3,A)') 'ERROR: MPI_PROCESS for MESH ', NM,' < MPI_PROCESS
for MESH ',NM-1,&
'. Reorder MESH lines.'
CALL SHUTDOWN(MESSAGE) ; RETURN
ENDIF
ENDDO
DO NM=1,NMESHES
IF (NM==1 .OR. .NOT.EVACUATION_ONLY(NM)) CYCLE
IF (.NOT.EVACUATION_GRID(NM)) CYCLE
IF (PROCESS(NM) < PROCESS(NM-1)) THEN
WRITE(MESSAGE,'(A,I3,A,I3,A)') 'ERROR: MPI_PROCESS for evacuation MESH ',
NM,' < MPI_PROCESS for MESH ',NM-1,&
'. Reorder MESH lines.'
CALL SHUTDOWN(MESSAGE) ; RETURN
ENDIF
ENDDO
ENDIF
This works because I have added following lines somewhere so
that just fire meshes have evacuation_only as false:
CALL ChkMemErr('READ','EVACUATION_ONLY',IZERO)
EVACUATION_ONLY(1:NMESHES_FIRE) = .FALSE.
IF (NMESHES_FIRE<NMESHES) EVACUATION_ONLY(NMESHES_FIRE+1:NMESHES) = .TRUE.
TimoK
Original issue reported on code.google.com by tkorhon1
on 2014-12-19 15:46:23
I'm not sure how this is related, but in any case, when I run the MPI version of FDS
in debug mode:
Evacuation/evac_smv_testcase1.err:forrtl: severe (408): fort: (8): Attempt to fetch
from allocatable variable EXTERIOR when it is not allocated
Original issue reported on code.google.com by mcgratta
on 2014-12-19 17:54:03
Ok, I'll check this. This is probably some other issue, but let's fix
this using this issue. Might be that I do not set EXTERIOR for
my additional evacuation meshes (STRS meshes and visibility meshes).
These meshes are there just to store some values (U and V and obst
info at different level), so no flow solver is used for these meshes.
But I should check that enough variables are allocated (even if they
are not actually used, just initialized).
TimoK
Original issue reported on code.google.com by tkorhon1
on 2014-12-22 08:15:07
NewSmokey (~/FDS+Evac/FDS+Evac_Tests/FDS_ScriptRuns/Evacuation): sh Check_Evacuation_MPI.sh
evac_smv_testcase1 using 3 processes DEBUG
And the script that runs the case:
#!/bin/bash
export OMP_NUM_THREADS=1
MPI_DIR=/share/apps/OpenMPI/openmpi-1.8.3_intel/bin
FDS_DIR=/home/rtetko/FDS_Source_Latest
echo evac_smv_testcase1 using 3 processes DEBUG
${MPI_DIR}/mpirun -bind-to none -np 3 ${FDS_DIR}/fds_mpi_intel_linux_64_db evac_smv_testcase1.fds
> evac_smv_testcase1.err 2>&1
And tail evac_smv_testcase1.err:
Time Step: 200, Simulation Time: 9.96 s
Time Step: 201, Simulation Time: 10.01 s
STOP: FDS completed successfully (CHID: evac_smv_testcase1)
And the openmpi that I use:
NewSmokey: which mpirun
/share/apps/OpenMPI/openmpi-1.8.3_intel/bin/mpirun
NewSmokey: which mpifort
/share/apps/OpenMPI/openmpi-1.8.3_intel/bin/mpifort
NewSmokey: which ifort
/share/apps/intel/composer_xe_2013_sp1.0.080/composer_xe_2013_sp1.0.080/bin/intel64/ifort
And here is the version info that I ran:
Fire Dynamics Simulator
Compilation Date : Sat, 20 Dec 2014
Current Date : December 22, 2014 10:47:32
Version : FDS 6.1.2
SVN Revision No. : 21221
MPI Enabled; Number of MPI Processes: 3
OpenMP Enabled; Number of OpenMP Threads: 1
Job TITLE : Test 1, fire+evac meshes
Job ID string : evac_smv_testcase1
And the fds input file should be the same as in the svn folder:
Omakone (/cygdrive/C/D/FdsEvac/Fds+Evac5_SMV): diff /cygdrive/C/fds-smv/Ve
rification/Evacuation/evac_smv_testcase1.fds /cygdrive/Y/FDS+Evac/FDS+Evac_Tests
/FDS_ScriptRuns/Evacuation/evac_smv_testcase1.fds
Omakone (/cygdrive/C/D/FdsEvac/Fds+Evac5_SMV):
("Omakone" = my own machine, C-drive has the SVN version, Y-drive is
the Linux cluster disk, where I run the case)
Wbr,
Timo
Original issue reported on code.google.com by tkorhon1
on 2014-12-22 09:07:19
The case ran successfully with 1 MPI process and 1 OpenMP thread. Thanks.
Original issue reported on code.google.com by mcgratta
on 2014-12-29 15:02:07
I must have made a mistake when running this case. There is still a problem. The problem
does not occur if you use 3 MPI processes. It occurs when you use 1. My comment in
#7 was wrong because if you run the case with 1 MPI process, these lines in read.f90
force EXTERIOR to be deallocated when MYID==EVAC_PROCESS. When you use only one MPI
process, MYID and EVAC_PROCESS will both be zero:
! EVAC_PROCESS needs only the allocations of SOLID, CELL_INDEX, OBST_INDEX_C, OBSTRUCTION,
and I,J,K_CELL
IF (.NOT.EVACUATION_ONLY(NM) .AND. MYID==EVAC_PROCESS) THEN
DEALLOCATE(M%EXTERIOR)
CYCLE MESH_LOOP_3
ENDIF
We must assume that these cases can be run with 1, 2, 3,..., or NMESHES MPI processes.
Original issue reported on code.google.com by mcgratta
on 2014-12-31 16:39:01
Ok, I'll fix the logic for MPI runs. Now I have assumed that
fire meshes are using different MPI process than the evacuation
meshes. Evacuation meshes are all in the same MPI process.
But, as I see, there can also be fire meshes in this MPI
process, if MPI is true and one uses just one MPI process.
So, I check all the MYID==EVAC_PROCESS). Or I do somewhere
a check that there is no fire meshes in the EVAC_PROCESS and
stop the run and say "Evacuation process should not contain
any fire meshes". Or something similar.
Happy new year!
Timo
Original issue reported on code.google.com by tkorhon1
on 2015-01-02 07:59:06
Is there a reason that all Evac meshes share the same MPI process, which should be different
than the flow MPI process? I would prefer that the number of MPI processes be flexible.
Original issue reported on code.google.com by mcgratta
on 2015-01-02 14:34:45
All evacuation meshes should be in the same MPI process. The evacuation
meshes need information from each other. But there is no reason that
there can not be any other (i.e., fire) mesh in the same MPI process.
I have just assumed that there is no other than evacuation meshes
in the "evacuation process". To correct this bad assumption, I have
just to check some IF-statements. Somewhere I check if the PROCESS
is an evacuation process => skip fire related meshes. And vice versa.
I need to put here a better logic.
Well, if there would be fire and evacuation meshes in the same process,
then I would assume that the number of MPI processes is equal to one.
And this means that no MPI calls are needed, all meshes see each other
directly (same process).
Well, the first remedy is to check that if the MPI = true, then
there should be at least two processes, if a fire+evac calculation
and check that evacuation process is different from the fire process.
WBR,Timo
Original issue reported on code.google.com by tkorhon1
on 2015-01-05 09:03:35
OK, just make sure that Evac cases can be run with only one MPI process, even if the
MPI version of FDS is used.
Original issue reported on code.google.com by mcgratta
on 2015-01-05 13:41:52
Issue 2349 has been merged into this issue.
Original issue reported on code.google.com by tkorhon1
on 2015-02-11 09:28:44
Issue 2331 has been merged into this issue.
Original issue reported on code.google.com by tkorhon1
on 2015-02-11 09:33:58
Kevin, would you check that the following bug fix is not
doing any harm to fire calculation:
!IF (EXCHANGE_RADIATION) CALL MESH_EXCHANGE(2)
IF (EXCHANGE_RADIATION) THEN
CALL MESH_EXCHANGE(2)
ELSE
IF (RADIATION) CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
END IF
This way all processes will always call MPI_BARRIER. The original
version just called the barrier, if the radiation was updated at
this time step. Evacuation process does not know if the fire meshes
update radiation or not, so I added barrier to all cases.
If you do not like the above IF statement, the alternative would be:
main-loop:
CALL MESH_EXCHANGE(2)
mesh_exchange subroutine, somewhere at the beginning:
IF (CODE==2 .AND. .NOT. EXCHANGE_RADIATION) THEN
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
RETURN
END IF
Here the problem is that the initialization part of main.f90 has:
! Iterate surface BCs and radiation in case temperatures are not initialized to ambient
DO I=1,NUMBER_INITIAL_ITERATIONS
DO NM=1,NMESHES
IF (PROCESS(NM)/=MYID) CYCLE
CALL WALL_BC(T_BEGIN,NM)
IF (RADIATION) CALL COMPUTE_RADIATION(T_BEGIN,NM)
ENDDO
IF (RADIATION) CALL MESH_EXCHANGE(2) ! Exchange radiation intensity at interpolated
boundaries
ENDDO
And I should see that the logic goes nicely here also. All processes
call mesh_exchange at this initialization phase, RADIATION=T also
for the evacuation process (I think so). If EXCHANGE_RADIATION is
set already to true for the fire processes, then the above version
should work, otherwise something should still be done.
I have tested the first version of the bug fix. If you say that
this fix is good enough then I'll commit it.
Timo
Original issue reported on code.google.com by tkorhon1
on 2015-02-11 15:33:34
A call to MPI_BARRIER forces each MPI process to wait at this point until all processes
arrive. In general, we do not like to do this because it adds to the inefficiency.
If you need to do an MPI_BARRIER, then make it happen only when an Evac calculation
is done. I don't want a non-Evac case to stop at a barrier if it doesn't have to.
Original issue reported on code.google.com by mcgratta
on 2015-02-11 15:50:18
Ok, I'll just add the mpi barrier to fire+evacuation meshes case.
And I'll hide it inside the mesh_exchange subroutine, so that
the main routine is easy to read.
Timo
Original issue reported on code.google.com by tkorhon1
on 2015-02-12 11:12:30
Things are fixed, svn 21734.
At the beginning of MESH_EXCHANGE subroutine:
IF (CODE==2) THEN ! Check the need for the radiation exchange
IF (ANY(EVACUATION_ONLY) .AND. N_MPI_PROCESSES>1 .AND. RADIATION) THEN
! Set barrier for fire processes, because evacuation process has a barrier in
every ICYC
IF (.NOT.EXCHANGE_RADIATION .AND. MYID/=EVAC_PROCESS) THEN
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
RETURN
END IF
ELSE ! No evacuation or just one process or no radiation
IF (.NOT.EXCHANGE_RADIATION .OR. .NOT.RADIATION) RETURN
END IF
END IF
And the calls "MESH_EXCHANGE(2)" in the main time loop and
initialization are just like
CALL MESH_EXCHANGE(2) ! Exchange radiation intensity at interpolated boundaries
e.g., no "IF (RADIATION) CALL MESH_EXCHANGE(2)". Radiation/need
for rad. mesh exchange is tested at the start of the called
subprogramme.
Timo
Original issue reported on code.google.com by tkorhon1
on 2015-02-13 11:51:57
I added the test cases back to the verification list. Thanks.
Original issue reported on code.google.com by mcgratta
on 2015-02-13 14:05:20
Original issue reported on code.google.com by
mcgratta
on 2014-11-23 21:21:06