firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
642 stars 614 forks source link

FDS Source: Bug in part.f90: move_in_gas? #5570

Closed tkorhon1 closed 6 years ago

tkorhon1 commented 6 years ago

Debug compilation:

Fire Dynamics Simulator

Current Date : October 10, 2017 17:56:52 Version : FDS 6.5.3 Revision : FDS6.5.3-2952-gd38d938 Revision Date : Tue Oct 10 10:03:13 2017 +0300 Compiler : Intel ifort 17.0.0 Compilation Date : Oct 10, 2017 17:38:44

MPI Enabled; Number of MPI Processes: 1 OpenMP Enabled; Number of OpenMP Threads: 1

MPI version: 3.0 MPI library version: Open MPI v1.8.3, package: Open MPI tstopi@espvm5m030.ad.vtt.fi Distribution, ident: 1.8.3, Sep 25, 2014

Job TITLE : Test sprinklers, heat detectors and smoke detectors Job ID string : device_test

some text snipped...

Time Step: 2200, Simulation Time: 52.81 s forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

Image PC Routine Line Source

fds_mpi_intel_lin 00000000035415C6 Unknown Unknown Unknown fds_mpi_intel_lin 0000000002B0385E partmove_particle 1707 part.f90 UBAR = AFILL2(U,IIG-1,JJY,KKZ,(LP%X-X(IIG-1))*RDX(IIG),Y_WGT,Z_WGT) fds_mpi_intel_lin 0000000002AE3AF8 part_mp_move_part 1330 part.f90 Call move_in_gas fds_mpi_intel_lin 0000000002ADEB4A part_mp_update_pa 1220 part.f90 Call move_particles fds_mpi_intel_lin 00000000033EAF82 MAIN 688 main.f90 Call update_particles fds_mpi_intel_lin 000000000040C0DE Unknown Unknown Unknown libc-2.12.so 00000033ABA1ED1D libc_start_main Unknown Unknown fds_mpi_intel_lin 000000000040BFE9 Unknown Unknown Unknown

SUBROUTINE MOVE_IN_GAS

USE MATH_FUNCTIONS, ONLY: EVALUATE_RAMP USE PHYSICAL_FUNCTIONS, ONLY : DRAG, GET_VISCOSITY USE MATH_FUNCTIONS, ONLY : AFILL2, RANDOM_CHOICE, BOX_MULLER REAL(EB) :: UBAR,VBAR,WBAR,RVC,UREL,VREL,WREL,QREL,RHO_G,TMP_G,MU_AIR, & U_OLD,V_OLD,W_OLD,ZZ_GET(1:N_TRACKED_SPECIES),WAKE_VEL,DROP_VOL_FRAC,RE_WAKE,& WE_G,T_BU_BAG,T_BU_STRIP,MPOM,ALBO,SFAC,BREAKUP_RADIUS(0:NDC),& DD,DD_X,DD_Y,DD_Z,DW_X,DW_Y,DW_Z,K_TERM(3),Y_TERM(3),C_DRAG,A_DRAG,HAB,PARACOR,QREL2,X_WGT,Y_WGT,Z_WGT,& GX_LOC,GY_LOC,GZ_LOC,DUMMY,DRAG_MAX(3) REAL(EB), SAVE :: FP_MASS,HALF_DT2,BETA,OBDT,ALPHA,OPA,DTOPA,BDTOA INTEGER IIX,JJY,KKZ

ZZ_GET = 0._EB

CALL GET_IJK(LP%X,LP%Y,LP%Z,NM,XI,YJ,ZK,LP%ONE_D%IIG,LP%ONE_D%JJG,LP%ONE_D%KKG)

U_OLD = LP%U V_OLD = LP%V W_OLD = LP%W

! Interpolate the nearest velocity components of the gas

IIX = FLOOR(XI+.5_EB) JJY = FLOOR(YJ+.5_EB) KKZ = FLOOR(ZK+.5_EB)

X_WGT = XI+.5_EB-IIX Y_WGT = YJ+.5_EB-JJY Z_WGT = ZK+.5_EB-KKZ IF (X_WGT>=0.5_EB .AND. WALL_INDEX(IC,-1)>0) X_WGT = 1._EB IF (X_WGT< 0.5_EB .AND. WALL_INDEX(IC, 1)>0) X_WGT = 0._EB IF (Y_WGT>=0.5_EB .AND. WALL_INDEX(IC,-2)>0) Y_WGT = 1._EB IF (Y_WGT< 0.5_EB .AND. WALL_INDEX(IC, 2)>0) Y_WGT = 0._EB IF (Z_WGT>=0.5_EB .AND. WALL_INDEX(IC,-3)>0) Z_WGT = 1._EB IF (Z_WGT< 0.5_EB .AND. WALL_INDEX(IC, 3)>0) Z_WGT = 0._EB UBAR = AFILL2(U,IIG-1,JJY,KKZ,(LP%X-X(IIG-1))*RDX(IIG),Y_WGT,Z_WGT) BUG HERE ? VBAR = AFILL2(V,IIX,JJG-1,KKZ,X_WGT,(LP%Y-Y(JJG-1))*RDY(JJG),Z_WGT) BUG HERE ? WBAR = AFILL2(W,IIX,JJY,KKG-1,X_WGT,Y_WGT,(LP%Z-Z(KKG-1))*RDZ(KKG)) BUG HERE ?

* THE BUG and the fix? ****

LP%X-X(IIG-1) ==> LP%X-X(LP%ONE_D%IIG-1) and same for y and z

SUBROUTINE GET_IJK(X,Y,Z,NM,XI,YJ,ZK,I,J,K)

! Given the point (X,Y,Z) in mesh NM, GET_IJK returns the continuous and discrete form of the cell indices, (XI,YJ,ZK) and (I,J,K). ! For example, if the mesh is the unit cube with 10 cm cells, for the point (X,Y,Z)=(0.23,0.46,0.66), GET_IJK would return ! (XI,JK,ZK)=(2.3,4.6,6.6) and (I,J,K)=(3,5,7).

Wbr, Timo

sbenkorichi commented 6 years ago

Timo, just a question, Once you make the changes, does the simulation finish with no errors?

mcgratta commented 6 years ago

Timo -- I don't think the program is flawed. I believe that IIG is defined because this subroutine is contained within another for which IIG is defined. It is possible, however, that IIG becomes -1 because a droplet escapes the computational domain. What happens when you make the change that you suggest?

tkorhon1 commented 6 years ago

Hi!

I changed the lines as:

UBAR = AFILL2(U,IIG-1,JJY,KKZ,(LP%X-X(LP%ONE_D%IIG-1))RDX(LP%ONE_D%IIG),Y_WGT,Z_WGT) VBAR = AFILL2(V,IIX,JJG-1,KKZ,X_WGT,(LP%Y-Y(LP%ONE_D%JJG-1))RDY(LP%ONE_D%JJG),Z_WGT) WBAR = AFILL2(W,IIX,JJY,KKG-1,X_WGT,Y_WGT,(LP%Z-Z(LP%ONE_D%KKG-1))*RDZ(LP%ONE_D%KKG))

But I get the same error (debug compilation):

Time Step: 2200, Simulation Time: 52.81 s forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

Image PC Routine Line Source

fds_mpi_intel_lin 0000000003541B66 Unknown Unknown Unknown fds_mpi_intel_lin 0000000002B03A39 partmove_particle 1707 part.f90 fds_mpi_intel_lin 0000000002AE3C28 part_mp_move_part 1330 part.f90 fds_mpi_intel_lin 0000000002ADEC7A part_mp_update_pa 1220 part.f90 fds_mpi_intel_lin 00000000033EB522 MAIN__ 688 main.f90 fds_mpi_intel_lin 000000000040C0DE Unknown Unknown Unknown libc-2.12.so 0000003ABD01ED5D __libc_start_main Unknown Unknown fds_mpi_intel_lin 000000000040BFE9 Unknown Unknown Unknown

And I looked the source:

So, the part that calls MOVE_IN_GAS sets the IIG,JJG and KKG

TIME_STEP_LOOP: DO ITER=1,N_ITER

! Determine the current coordinates of the particle

  IIG = LP%ONE_D%IIG
  JJG = LP%ONE_D%JJG
  KKG = LP%ONE_D%KKG
  IC = CELL_INDEX(IIG,JJG,KKG)
  X_OLD = LP%X
  Y_OLD = LP%Y
  Z_OLD = LP%Z

And then calls move_in_gas, so the iig,jjg,kkg are LP%ONE_D%IIG etc there.

     CALL MOVE_IN_GAS

MOVE_IN_GAS: CALL GET_IJK(LP%X,LP%Y,LP%Z,NM,XI,YJ,ZK,LP%ONE_D%IIG,LP%ONE_D%JJG,LP%ONE_D%KKG) ! Interpolate the nearest velocity components of the gas

IIX = FLOOR(XI+.5_EB) JJY = FLOOR(YJ+.5_EB) KKZ = FLOOR(ZK+.5_EB)

UBAR = AFILL2(U,IIG-1,JJY,KKZ,(LP%X-X(IIG-1))*RDX(IIG),Y_WGT,Z_WGT)

One question: Are the LP%ONE_D%IIG etc updated during the time loop correctly? So that the IIG in UBAR above is the current LP%ONE_D%IIG. I think that this is what is wanted. Or does one want to use some other IIG? Well, one could add IIG = LP%ONE_D%IIG JJG = LP%ONE_D%JJG KKG = LP%ONE_D%KKG to the MOVE_IN_GAS after the CALL GET_IJK. This would make the source code easier to read. And the compiler should be clever enough no to increase the computational cpus.

But, anyway, you would have sometimes X(-1) so it will be out of bounds.

Wbr, Timo

rmcdermo commented 6 years ago

Timo, What you are proposing to touch in the code would cause us to rerun all our validation cases for a release. That is fine, but I would like to see more proof that there is a problem. Something as fundamental as which cell a particle is in for interpolation of velocity components can easily be checked with a verification case. You can initialize different velocities in different cells and show that there is a problem with the particle positions or velocity from there. Thanks.

mcgratta commented 6 years ago

I will add a redefine of IIG, JJG, KKG as you suggest. That's the safest thing to do at the moment. Once we get out a release, I will make IIG etc pointers to LP%ONE_D%IIG so that we are not creating new variables but rather simply a shortcut to make the code easier to read.

tkorhon1 commented 6 years ago

Fire Dynamics Simulator

Current Date : October 17, 2017 10:19:00 Version : FDS 6.5.3 Revision : FDS6.5.3-2988-g75ddf29 Revision Date : Mon Oct 16 14:18:04 2017 -0400 Compiler : Intel ifort 17.0.0 Compilation Date : Oct 17, 2017 10:15:44

(I.e, Kevin's addition "I will add a redefine of IIG, JJG, KKG as you suggest." already there.)

But:

Time Step: 2200, Simulation Time: 52.81 s forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

Image PC Routine Line Source

fds_mpi_intel_lin 0000000003542B76 Unknown Unknown Unknown fds_mpi_intel_lin 0000000002B03BB9 partmove_particle 1711 part.f90 fds_mpi_intel_lin 0000000002AE3C28 part_mp_move_part 1330 part.f90 fds_mpi_intel_lin 0000000002ADEC7A part_mp_update_pa 1220 part.f90 fds_mpi_intel_lin 00000000033EC53E MAIN__ 688 main.f90 fds_mpi_intel_lin 000000000040C0DE Unknown Unknown Unknown libc-2.12.so 0000003ABD01ED5D __libc_start_main Unknown Unknown fds_mpi_intel_lin 000000000040BFE9 Unknown Unknown Unknown

I'll try to find more about this error. Now I'm quite busy, but I'll do this case with small priority (i.e., a couple of minutes every day basis....)

First thing I try to do: I add some extra write statements to part.90 (with "If time .ge. 52.81 s" => write particle number etc information to nail out which particle is responsible of the crash. Then I add writes just for this particle to see the x(t),y(t),z(t), iig(t), jjg(t), kkg(t) etc information on each time step.

Wbr, Timo

sbenkorichi commented 6 years ago

Timo, Wouldn't be possible from your end, to share a simplified inputfile, so we can look at from our end as well. There might be a chance that something not right about the input file. Thanks.

tkorhon1 commented 6 years ago

Well, could be. I do not remember, when I updated the device_test.fds file. But:

ESPKT3K1133 (/cygdrive/C/Firemodels_fork/fds/Verification/Controls): diff -i -w device_test.fds /cygdrive/Y/FDS+Evac/FDS+Evac_Tests/FDS_ScriptRuns/Verification_DB/Controls/device_test.fds ESPKT3K1133 (/cygdrive/C/Firemodels_fork/fds/Verification/Controls): ESPKT3K1133 (/cygdrive/C/Firemodels_fork/fds/Verification/Controls): git status On branch master Your branch is up-to-date with 'origin/master'.

So, my device_test.fds should be the same one that is in the GIT.

And below is my latest results. It seems, that the problem is that the particle is going from the mesh 2 to mesh 1 at that time step. Actually, the particle is moving mesh at the internal particle time loop (TIME_STEP_LOOP: DO ITER=1,N_ITER)

  ELSE SOLID_GAS_MOVE

     IF (T .ge.   54.4 .AND. NM==2 .AND. IP==2494) THEN
        write(lu_err,*)'*** icyc, ip, times ',icyc,ip,T,DT,DT_P,N_ITER,NM
        write(lu_err,*)'*** ip: xyz ',ip,LP%X,LP%Y,LP%Z
        write(lu_err,*)'*** ip: xyz ',ip,X_OLD,Y_OLD,Z_OLD
        write(lu_err,*)'*** ip: ijk ',ip,IIG,JJG,KKG
        write(lu_err,*)'*** ip: ijk ',ip,LP%ONE_D%IIG,LP%ONE_D%JJG,LP%ONE_D%KKG

!!$ &MESH IJK=30,10,10, XB=0.0,3.0,0.0,1.0,0.0,1.0 / !!$ &MESH IJK=10,40,10, XB=3.0,4.0,0.0,4.0,0.0,1.0 / !!$ icyc, ip, times 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2 !!$ ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 !!$ ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 !!$ ip: ijk 2494 1 11 3 !!$ ip: ijk 2494 1 11 3 !!$ icyc, ip, times 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2 !!$ ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 !!$ ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 !!$ ip: ijk 2494 0 10 3 !!$ ip: ijk 2494 0 10 3 !!$forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

     END IF
     CALL MOVE_IN_GAS

Well, I think that it is this particle (ip=2494) that crashes. But I run right now a test, where I write info just before "call move_in_gas" to be sure, that it is really this particle. (My if had ip==2494), so the crash could be for some later particle also and later times, but it should be for this particle, because IIG=0 ==> IIG-1 = -1 => the error message.

Wbr, Timo

tkorhon1 commented 6 years ago

So, I have attached the input file. It is same as in the FDS V&V suite.

Wbr, Timo device_test.txt

sbenkorichi commented 6 years ago

Weird Timo, I've tested it with bundle and the up to date compiled, no sing of errors, even with the debug one.

sbenkorichi commented 6 years ago

Timo, One suggestion I would recommend is to clone the original repo (https://github.com/firemodels/fds) Then, compile the code, and rerun the case. --There might be some conflicts in your repo

If this also doesn't work, then chances are might be due to how intel was installed and/or how openmpi was compiled.

We'll see.

tkorhon1 commented 6 years ago

Hi!

I got the stuff from https://github.com/firemodels/fds And I used "Clone or Download" => "Download ZIP" and unzipped it to our Linux Cluster and used the mpi_intel_linux_64_bd Build. the error seem to occur at the same time, probably the same time and particle, but I do not know, because there are no extra write statements now. The line 1711 is the same one: UBAR = AFILL2(U,IIG-1,JJY,KKZ,(LP%X-X(IIG-1))*RDX(IIG),Y_WGT,Z_WGT)

Fire Dynamics Simulator

Current Date : October 18, 2017 09:58:43 Version : FDS 6.5.3 Revision : - Revision Date : Compiler : Intel ifort 17.0.0 Compilation Date : Oct 18, 2017 09:56:12

MPI Enabled; Number of MPI Processes: 1 OpenMP Enabled; Number of OpenMP Threads: 4

MPI version: 3.0 MPI library version: Open MPI v1.8.3, package: Open MPI tstopi@espvm5m030.ad.vtt.fi Distribution, ident: 1.8.3, Sep 25, 2014

Job TITLE : Test sprinklers, heat detectors and smoke detectors Job ID string : device_test

Time Step: 2100, Simulation Time: 50.31 s Time Step: 2200, Simulation Time: 52.81 s forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

Image PC Routine Line Source

fds_mpi_intel_lin 0000000003557446 Unknown Unknown Unknown fds_mpi_intel_lin 0000000002232BC5 partmove_particle 1711 part.f90 fds_mpi_intel_lin 0000000002212C34 part_mp_move_part 1330 part.f90 fds_mpi_intel_lin 000000000220DC86 part_mp_update_pa 1220 part.f90 fds_mpi_intel_lin 0000000003400E02 MAIN__ 688 main.f90 fds_mpi_intel_lin 000000000040C0DE Unknown Unknown Unknown libc-2.12.so 0000003ABD01ED5D __libc_start_main Unknown Unknown fds_mpi_intel_lin 000000000040BFE9 Unknown Unknown Unknown

Wbr, Timo

sbenkorichi commented 6 years ago

Timo, Let's try this experiment as well, Get the bundle fds 6.5.3 from here. https://pages.nist.gov/fds-smv/downloads.html Test it on the same case, and see if it still gives an error or not. Then, we can start isolating the error. It is quite difficult for me now to point out to the source as it's working on our ends.

rmcdermo commented 6 years ago

@tkorhon1, What compiler are you using? This looks like an uninitialized variable, perhaps.

@sbenkorichi, Can you test with gnu db version?

sbenkorichi commented 6 years ago

With gnu and gnu db works fine.

mcgratta commented 6 years ago

Timo -- can you determine the coordinates of the particle that is causing the problem. LP%X, LP%Y, LP%Z ?

tkorhon1 commented 6 years ago

Hi Kevin,

This information was already above, but I'll copy-and-paste it here:

write(lu_err,*)'* icyc, ip, times ',icyc,ip,T,DT,DT_P,N_ITER,NM write(lu_err,)' ip: xyz ',ip,LP%X,LP%Y,LP%Z write(lu_err,*)'* ip: xyz ',ip,X_OLD,Y_OLD,Z_OLD write(lu_err,)' ip: ijk ',ip,IIG,JJG,KKG write(lu_err,*)'*** ip: ijk ',ip,LP%ONE_D%IIG,LP%ONE_D%JJG,LP%ONE_D%KKG

              # of part.      LP%X                       LP%Y                   LP%Z

!!$ ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 !!$ ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 IIG JJG KKG !!$ ip: ijk 2494 1 11 3 !!$ ip: ijk 2494 1 11 3 !!$ *** icyc, ip, times 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2

of part. LP%X LP%Y LP%Z

                        Now x is out of mesh #2
                        but xyz belongs to mesh #1

!!$ ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 !!$ ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 IIG JJG KKG !!$ ip: ijk 2494 0 10 3 !!$ ip: ijk 2494 0 10 3

So LP%X .lt. mesh x_min ==> IIG < 1. But the particle is not inside a solid, it has gone to neighboring mesh. But it has not done this at the main FDS timestep DT. The particle time step for this particle is 0.5*DT, i.e., two internal time iterations. And the transition mesh 2 => mesh 1 happens at the second internal time step iteration.

I do not know, when there is a check, if the particle is moving from one mesh to some other mesh, seems that not here (i.e. in the sub move_in_gas).

So, you could see, if the particles are moved correctly between the meshes in the source code. I'm not familiar with part.f90.

Salah:

And the case runs with FDS 6.5.3 (bundle for DOS and its successors, whose names I do now want to pronounce...). The executable was fds.exe, i.e., no debug options, it is the normal release version. I do not have a DOS Fortran compiler.

Above error messages are from Linux debug version. The compiler and MPI versions are above also, here they are once again (they are printed on the stderr channel...):

Compiler : Intel ifort 17.0.0 Compilation Date : Oct 18, 2017 09:56:12 MPI Enabled; Number of MPI Processes: 1 OpenMP Enabled; Number of OpenMP Threads: 4 MPI version: 3.0 MPI library version: Open MPI v1.8.3, package: Open MPI tstopi@espvm5m030.ad.vtt.fi Distribution, ident: 1.8.3, Sep 25, 2014

Well, here is a little bit more about the compiler on our Linux cluster:

NewSmokey (~/FDS_Source_Latest): ifort -V Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.0.098 Build 20160721 Copyright (C) 1985-2016 Intel Corporation. All rights reserved.

Wbr, Timo

Btw: I'm on holiday Fri-Mon (www.tallinncup.ee), so back at the office on Tue next week.

sbenkorichi commented 6 years ago

Well, the case works with the bundle. That leaves me with the option that the issues lies from your system end and not the source code. What I would suggest is the following:

  1. Rienstall Intel compiler (maybe you didn't include more features during its installation).
  2. Recompile OpenMPI (preferably using version 2 or 3 as it's up to date and many bugs were fixed in between). I believe this would probably solve your issue. At least give it a try.
rmcdermo commented 6 years ago

Timo, We are using ifort 17.0.4.

sbenkorichi commented 6 years ago

Me too, it's 17.04 they released an update so I downloaded it also try to download the full package and not the online one for installation.

tkorhon1 commented 6 years ago

Well, I inspected the part.f90, its logic is shown below. For my opinion, it has a bug. If the internal time step for a particle, DT_P, is less than DT, then it might happen that the particle goes out of the mesh at the first internal time step (or whatever one, not the last internal time step). In my case, there is just two internal time steps, DT_P=0.5*DT. So, after the first internal time step (the TIME_STEP_LOOP: DO ITER=1,N_ITER), the particle x is out of the current mesh. But the loop goes for the last time step. Only after the TIME_STEP_LOOP the CALL REMOVE_PARTICLES(T,NM) is called. The REMOVE_PARTICLES should be called inside the TIME_STEP_LOOP, in principle. It might be, that REMOVE_PARTICLES can not be put there, because the TIME_STEP_LOOP is inside the PARTICLE_LOOP. But you should see my point.

I do not know, why you do not see this problem. It might be due to the things related to the compilation etc. So, some floating point accuracy etc thing. The case I found for my particle could be quite rare. Well, in a real fire simulation with sprinklers etc, it might be that this is not so rare, because there are lots of particles.

SUBROUTINE MOVE_PARTICLES(T,DT,NM) PARTICLE_LOOP: DO IP=1,NLP DT_P = DT/REAL(N_ITER,EB) ! Internal particle time step, which can be less than DT TIME_STEP_LOOP: DO ITER=1,N_ITER CALL MOVE_ON_SOLID ! So, inside this time loop it might happen that a particle goes out of the current mesh. ! So, it should be removed from this mesh and one should see, if it has gone to some ! other mesh. ! How about the other "Remove PARTICLEs that have left the current mesh (NM) or are no longer to be tracked"

! What you have inside the TIME_STEP_LOOP, for example:

  ! Where is the PARTICLE now? Limit the location by UBOUND and LBOUND due to the possible super fast PARTICLEs
  ! If PARTICLE hits an obstacle, change its properties
     ! Check if any solid boundaries of original grid cell have been crossed
     ! Remove the particle if it is not allowed on a surface
     ! Get the wall index of the surface
     ! If no solid boundaries of original cell have been crossed, check boundaries of new grid cell
     ! Special case where the particle is inside a solid but cannot be processed for some reason.
     ! Just return the particle to its starting position and set its velocity to 0.
     ! Check if PARTICLE has crossed no solid planes or too many
        ! Add PARTICLE mass to accumulated liquid array
        ! Adjust the size of the PARTICLE and weighting factor
        ! Move particle to where it almost hits solid
        ! Check if PARTICLE has not found surface. Simply remove for now. Todo: search algorithm
        ! Choose a direction for the PARTICLEs to move
        ! If the particle is solid, as opposed to a liquid droplet, do not make it stick to the wall.
  ! Check if PARTICLEs that were attached to a solid are still attached after the time update

! So, you check pretty much all things inside this time loop, but not if Elvis has left the building.

ENDDO TIME_STEP_LOOP ENDDO PARTICLE_LOOP

! Remove out-of-bounds particles CALL REMOVE_PARTICLES(T,NM)

Contains: SUBROUTINE REMOVE_PARTICLES(T,NM) ! Remove PARTICLEs that have left the current mesh (NM) or are no longer to be tracked PARTICLE_LOOP: DO IP=1,NLP WEED_LOOP: DO ! Remove particles that have left the active mesh IF (LP%X>MESHES(NM)%XS .AND. LP%X<MESHES(NM)%XF .AND. LP%Y>MESHES(NM)%YS .AND. LP%Y<MESHES(NM)%YF .AND. & LP%Z>MESHES(NM)%ZS .AND. LP%Z<MESHES(NM)%ZF) CYCLE PARTICLE_LOOP ! Replace all other particles CALL PARTICLE_ORPHANAGE ! Here out of the current mesh particles (like my "error particle") ENDDO WEED_LOOP ENDDO PARTICLE_LOOP

Contains: SUBROUTINE PARTICLE_ORPHANAGE ! Check to see if the particle has entered another mesh (NOM)

NOM = 0 SEARCH_LOOP: DO OM=1,NMESHES IF (LP%X>M%XS .AND. LP%X<M%XF .AND. LP%Y>M%YS .AND. LP%Y<M%YF .AND. LP%Z>M%ZS .AND. LP%Z<M%ZF) THEN NOM = OM EXIT SEARCH_LOOP ENDIF ENDDO SEARCH_LOOP ! For particles that have entered another mesh (NOM), add them to the "orphanage"

! Zero out storage for particle that is being removed

! Move particle at the end of the line into the vacated particle

Wbr, Timo

mcgratta commented 6 years ago

Timo, thanks. I think that slightly different versions of the compilers and optimization has probably led to slightly different number generating so that you have this one particle that escapes our logic. I'll see if I can plug the hole.

sbenkorichi commented 6 years ago

Timo, Now, I suspect your issue has to do with your openmpi, why? I've tried to compile the same openmpi 1.8.3 you're using, and so many bugs with it. I couldn't even compile fds. I'm using intel compiler. I advise you compile at least version 2 or 3 and see if it fixes it.

mcgratta commented 6 years ago

Timo -- I have looked through the logic of the particle routine, and I still cannot figure out what is going wrong. Can you do the following: for particle 2494 in mesh 2, can you printout when the value of LP%ONE_D%IOR changes. The parameter tells me which wall the particle sticks to. When a particle hits a wall, this parameter changes from 0 (meaning it is in the gas phase) to pm 1, 2 or 3, depending on which wall it is stuck to. I believe that particle 2494 should hit the wall, and then be returned to a position just outside of the wall. It should not return to the gas phase (IOR=0). I cannot figure out how the particle is being returned to the gas phase for the second iteration.

mcgratta commented 6 years ago

Salah -- let's not worry about compilers. I want to see if the version Timo is running identifies a flaw in the particle logic.

tkorhon1 commented 6 years ago

Salah: FYI: I did not get the error message, when I ran with 2 processors. The error shows up only, when setting -np 1 (or just fds....exe chid.fds). But I have noticed, that the FDS results are generally not the same, when doing single proc vs multiple proc comparison.

Kevin: The IOR stuff: The IOR=0 at the first internal time step, IOR=+3 at the second internal time step, when the error occurs.

r1: x1=3.01076242048210 y1= 1.00134708979405 z1= 0.232550033143915 r2: x2=2.98500000000000 y2= 0.995411925198389 z2= 0.221151262127147

If I see it correctly, the straight line from r1 => r2 should be inside the meshes 1 and 2, so the particle is not touching a wall here. Well, I try to print out the u,v,w info and see, if the velocity is along the r1=>r2 direction also (and print the xyz,uvw also at the previous time step, hoping that the particle # is still the same).

!!$ &MESH IJK=30,10,10, XB=0.0,3.0,0.0,1.0,0.0,1.0 / !!$ &MESH IJK=10,40,10, XB=3.0,4.0,0.0,4.0,0.0,1.0 / !!$ icyc, ip, times 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2 !!$ ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 !!$ ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 !!$ ip: ijk 2494 1 11 3 !!$ ip: ijk 2494 1 11 3 IOR=0 !!$ icyc, ip, times 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2 !!$ ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 !!$ ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 !!$ ip: ijk 2494 0 10 3 !!$ ip: ijk 2494 0 10 3 IOR=3 !!$forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

!!$ *** icyc,t,nm,ip 2270 54.4447074399974 2 2494 !!$forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which !!$is less than the lower bound of 0

image

So, the r1=>r2 is not touching a outer wall. The particle is going just from one mesh to other one. But the case with uvw output is now running, some tens of minutes cpu time. I'll analyse those results soon.

Wbr, Timo

sbenkorichi commented 6 years ago

Under the same system with the same solver you should get similar results, otherwise this should be a bug. We had earlier and issue with threads, Kevin has set some verification cases can be found /Verification/Thread_Check/ directory. The different results you are talking about are from the device.fds input file? If not, it would be helpful to share a simplified input file. We might need a verification cases to check serial vs parallel cases.

mcgratta commented 6 years ago

Salah -- the droplets and particles are introduced via a random number generator, and their movement can be subtly altered by some very small change in arithmetic from one version of the compiler to another. So Timo, I believe, has identified a bug in the logic of how droplets are passed from one mesh to another. This might be a one in a million occurrence that we have not seen yet, or this might be why some people's jobs fail after 3 days and we cannot figure out why.

Timo -- good hints. I should be able to fix.

tkorhon1 commented 6 years ago

So, now the particle LP%TAG = 111830 has the coordinates shown in the figure below. Black line with diamonds: x,y of the particle Gray dots with black open circle: x,y of the particle calculated using u and v, i.e, the last time point x,y and x_new=x + dtu, y_new = y + dtv.

So, something odd going on close to the "mesh corner". It seems that the trajectory of the particle should be a straight line (about). Is the particle touching the corner? Is there a check for this? I.e., particle has some finite diameter and that added to the (x,y,z) of the centre point => touched the wall corner => some "drag force"? Or shoud it "hit a wall" here? Well, should I also output the particle diameter?

Note: The gray dots + circles are plotted as:

x y
3.010762 1.001347 x,y, icyc 2268
2.984443 0.974330 new x,y using u,v, dt info
   
3.059916 1.051445 x,y, icyc 2269
3.009411 1.000331 new x,y using u,v, dt info
   
3.111716 1.103648 x,y, icyc 2270
3.058564 1.050304 new x,y using u,v, 0.5*dt_fire info

image

Below is the "raw data":

IF (T .ge. 54.3 .AND. LP%TAG==1118308) THEN write(lu_err,*)'* icyc, ip, times LP_TAG ',icyc,ip,T,DT,DT_P,N_ITER,NM,LP%TAG write(lu_err,)' ip: xyz ',ip,LP%X,LP%Y,LP%Z write(lu_err,*)'* ip: uvw ',ip,LP%U,LP%V,LP%W write(lu_err,)' ip: ijk, IOR ',ip,LP%ONE_D%IIG,LP%ONE_D%JJG,LP%ONE_D%KKG,LP%ONE_D%IOR END IF SOLID_GAS_MOVE: IF (LP%ONE_D%IOR/=0) THEN

icyc, ip, times LP_TAG 2265 2494 54.3202756143491 2.439839718595236E-002 2.439839718595236E-002 1 2 1118308 ip: xyz 2494 3.28237671344241 1.27586248048163 0.679266678785800 ip: uvw 2494 -2.48305116872305 -2.53449206003569 -3.48972563768302 ip: ijk, IOR 2494 3 13 7 0

icyc, ip, times LP_TAG 2266 2494 54.3446740115350 2.439839718595236E-002 2.439839718595236E-002 1 2 1118308 ip: xyz 2494 3.22306748651437 1.21565010793596 0.593307446747604 ip: uvw 2494 -2.38232837497873 -2.40592903764535 -3.56260248465917 ip: ijk, IOR 2494 3 13 6 0

icyc, ip, times LP_TAG 2267 2494 54.3690724087210 2.439839718595236E-002 2.439839718595236E-002 1 2 1118308 ip: xyz 2494 3.16616622451897 1.15833173902327 0.505472893869552 ip: uvw 2494 -2.28694859868771 -2.29817901927897 -3.64551031896678 ip: ijk, IOR 2494 2 12 6 0

icyc, ip, times LP_TAG 2268 2494 54.3934708059069 2.439839718595236E-002 2.439839718595236E-002 1 2 1118308 ip: xyz 2494 3.11171557680020 1.10364846146137 0.415906296326982 ip: uvw 2494 -2.17849807577830 -2.18639790773044 -3.69988161900917 ip: ijk, IOR 2494 2 12 5 0

icyc, ip, times LP_TAG 2269 2494 54.4178692030929 2.439839718595236E-002 2.439839718595236E-002 1 2 1118308 ip: xyz 2494 3.05991645539847 1.05144465548870 0.324923936296325 ip: uvw 2494 -2.07004829398789 -2.09493956014444 -3.76215733432968 ip: ijk, IOR 2494 1 11 4 0

icyc, ip, times LP_TAG 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2 1118308 ip: xyz 2494 3.01076242048210 1.00134708979405 0.232550033143915 ip: uvw 2494 -1.96121478121610 -2.01317493577208 -3.81339086226027 ip: ijk, IOR 2494 1 11 3 0 icyc, ip, times LP_TAG 2270 2494 54.4447074399974 2.683823690454760E-002 1.341911845227380E-002 2 2 1118308 ip: xyz 2494 2.98500000000000 0.995411925198389 0.221151262127147 ip: uvw 2494 0.000000000000000E+000 0.000000000000000E+000 0.500000000000000 ip: ijk, IOR 2494 0 10 3 0

forrtl: severe (408): fort: (3): Subscript #1 of the array X has value -1 which is less than the lower bound of 0

Wbr, Timo

mcgratta commented 6 years ago

I understand what is happening now, and I implemented a quick fix that hopefully will catch this. In short, the droplet is caught up in the algorithm that is used to move droplets around a corner, but the logic is forcing the droplet to remain outside of the computational domain. That is why you see that slight turn in the droplet trajectory. It is "stuck" to the wall. The logic needs to be improved, but I will wait until the next release. For the moment, I added a line to kick this droplet out of the time stepping loop, where it will be picked up by the other mesh.

Let me know if this fixes the case.

mcgratta commented 6 years ago

Timo -- you could win a Boris Prize for this. Tenacious debugging.

tkorhon1 commented 6 years ago

Kevin, now I do not get the error anymore. But the particle trajectory is not going to be very nice. The u,v,w are reset now. u=v=0.0 and w=+0.5 m/s. Should it be like this? Or should one have more or less the old u,v,w in this case?

Well, for a temporary bug fix u,v,w does not matter, this is quite rare special case. For now, it is enough that there is no error message and the particle is put to the other mesh, where it belongs.

*** icyc, ip, times LP_TAG 2270 2494 1118308 xyz 3.01076242048210 1.00134708979405 0.232550033143915 uvw -1.96121478121610 -2.01317493577208 -3.81339086226027

*** icyc, ip, times LP_TAG 2271 14313 1118308 NOW IN A NEW MESH xyz 2.98500000000000 0.995411925198389 0.221151262127147 uvw 0.000000000000000E+000 0.000000000000000E+000 0.500000000000000

*** icyc, ip, times LP_TAG 2272 13445 1118308 xyz 13445 2.98494742259319 0.995409330407696 0.230753924468352 uvw 13445 -3.809513033366807E-003 -1.880063997938406E-004 0.216183451176770

Wbr, Timo

mcgratta commented 6 years ago

Right, there is still work to be done and I will keep this issue open to remind me to work on it after the minor release. The solution in this case is to restore the position and velocity of the droplet that it was assigned after the call to MOVE_IN_GAS and then exit the time stepping loop, after which it will be picked up by the other mesh.

mcgratta commented 6 years ago

I have simplified the particle transport routine to fix these sorts of problems. The new routine is much better at tracking particles on solids and mesh to mesh.

mcgratta commented 6 years ago

The new routine passes firebot and BRE_SPRAY, LEMTA_SPRAY, and VTT_SPRAY remain unchanged.