firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
664 stars 624 forks source link

FDS and initial noise better for MPI debugging? #1469

Closed gforney closed 9 years ago

gforney commented 9 years ago
Hi Randy, Jason, and Kevin,

I have a request for the FDS initial noise feature.
Could we change the routine so that the serial 
and mpi calculations will produce exactly the
same results? This makes the debugging easier.

I do the debugging for fire and fire+evacuation
meshes cases so that the fire output should be
exactly the same for both (cmp for binary files).
But I can not do this for MPI calculation, because
the random numbers are different for MPI and serial
calculation. Below is a proposal how you could change
the initial noise. Then the serial and MPI (fire with
or without evacuation) will produce exactly the same
results if there is no bugs in the code. I.e., you
can use cmp-command to check the binary output files.
(Well, at least for "debug" compilation, where you
have "-fp:strict -fp:except".)

SUBROUTINE INITIAL_NOISE(NM)

! Generate random noise at the start of the simulation

REAL(EB) :: VFAC,RN
INTEGER, DIMENSION(:), ALLOCATABLE :: SEED_RND
INTEGER :: I,J,K,IZERO,SIZE_RND
INTEGER, INTENT(IN) :: NM

IF (EVACUATION_ONLY(NM)) RETURN

! Waste a few calls to RANDOM_NUMBER to avoid generating the exact same sequence on
each mesh

SEED_RND= 2819 + 31*NM
CALL RANDOM_SEED(SIZE=SIZE_RND)
ALLOCATE(SEED_RND(SIZE_RND),STAT=IZERO)
CALL CHKMEMERR('INITIAL_NOISE','SEED_RND',IZERO)
CALL RANDOM_SEED(PUT=SEED_RND)
DEALLOCATE(SEED_RND)
DO I=1,NM
   IF (EVACUATION_ONLY(I)) CYCLE
   CALL RANDOM_NUMBER(RN)
ENDDO

! Point to local mesh variables
And so on...

Or something similar should be done so that the initial
noise will be the same for serial and mpi cases. Well, of
course, one could use NOISE=.FALSE. instead. The "31" should
not be too large: If there are 1000 meshes then 31000 is
still a nice integer.

Serial vs random stuff random initialization at present:

 Serial
 1) no seed is given anywhere
 2) first mesh (nm=1) picks one rnd.
    then some i*j*k more rnds are picked.
 3) second mesh pics (nm=2) two rnds.
    then some i*j*k more rnds are picked.
 4) and so on
 So, there is no need to have the DO I=1,NM loop
 at all, because the different meshes will use 
 diffent random numbers anyhow (the first mesh
 picks already i*j*k numbers at least).

 MPI
 1) no seed is given anywhere, so each process will
    use the same seed (some default in fortran, compiler,...?)
 2) Say, proc=0 = mesh 1, proc=1 = mesh 2, etc
 3) mesh 1: pics one rnd and then i*j*k more, but the other
    meshes do no know this (different processes)
 4) mesh 2: pics two rnd (using the default random seed)
    and then pics i*j*k more or so. And similarly for
    other processes. So this is different when you look
    the serial case: in the serial case, there are one
    rnd + i*j*k rnds already taken (for the first mesh),
    where as in the mpi case the same default seed is used
    that is used also for the mesh 1.
 5) You see my point...   

Ciao,
Timo

Original issue reported on code.google.com by tkorhon1 on 2011-09-29 10:34:40

gforney commented 9 years ago
Should we also add the ability to use a pseudo-random seed like the time integer so
that a user can make multiple runs that do not start exactly the same?  

Original issue reported on code.google.com by drjfloyd on 2011-09-29 12:19:01

gforney commented 9 years ago
The evac.f90:

    IF (NOT_RANDOM ) WRITE(LU_EVACOUT,FMT='(A)') ' FDS+Evac Random seed is not used.'
    CALL RANDOM_SEED(size=size_rnd)
    ALLOCATE(seed_rnd(size_rnd),STAT=IZERO)
    CALL ChkMemErr('READ_EVAC','seed_rnd',IZERO)
    IF (.NOT. NOT_RANDOM) THEN    ! Initialize the generator randomly
       CALL DATE_AND_TIME(values = t_rnd)
       seed_rnd = 31*t_rnd(7) + 29*t_rnd(8)
    ELSE
       ! Do not use a random seed, use a constant seed
       seed_rnd = 2819
    END IF
    CALL RANDOM_SEED(put=seed_rnd)
    DEALLOCATE(seed_rnd)

TimoK

Original issue reported on code.google.com by tkorhon1 on 2011-09-30 06:42:52

gforney commented 9 years ago
Timo -- is this still something that is important for you. I have not had a chance to
work on it. I am trying to close out old issues.

Original issue reported on code.google.com by mcgratta on 2012-05-17 20:35:19

gforney commented 9 years ago
Hi Kevin,

This is not too important to me, because I
can easily make the changes below to the
source code, before I test the evacuation
part (so that: "serial fire" = "serial
fire+evacuation" for the fire output and
"serial fire+evacuation" = "mpi fire+evacuation"
for the fire and evacuation output).

Actually, I have been doing this last months already.
But it is up to you if you would like to add the
following lines to source code or not. So, make the
desicion and then change the status to "WontFix"
or "Fixed".

(Well, the evac.f90 error issue: I should take
the random seed things away and see if I get the
error, I have had the random seed things there when
testing that issue. This should not matter, but you
can not ever be sure.)

========================================
 Now comparing  init.f90 (< Own, > SVN)
========================================
2829,2830c2829
< INTEGER  :: I,J,K,SIZE_RND
< INTEGER, DIMENSION(:), ALLOCATABLE :: SEED_RND
---
> INTEGER  :: I,J,K
2836,2842d2834
<
< CALL RANDOM_SEED(SIZE=SIZE_RND)
< ALLOCATE(SEED_RND(SIZE_RND),STAT=IZERO)
< CALL CHKMEMERR('INITIAL_NOISE','SEED_RND',IZERO)
< SEED_RND = 2819 * 13*NM
< CALL RANDOM_SEED(PUT=SEED_RND)
< DEALLOCATE(SEED_RND)

Timo

Original issue reported on code.google.com by tkorhon1 on 2012-05-18 06:55:56

gforney commented 9 years ago
We should not have to change the code to do diagnostics. So I suggest that you make
the change. I assume that this will not affect dramatically the normal operation of
FDS.

Original issue reported on code.google.com by mcgratta on 2012-05-18 12:31:29

gforney commented 9 years ago
Now it is committed, see the log message "Revision: r10762". 

The issue is now Verified, because I (the issue reporter)
have checked it.

Timo

Original issue reported on code.google.com by tkorhon1 on 2012-05-23 07:50:02