Yinan-Scott-Shi / fds-smv

Automatically exported from code.google.com/p/fds-smv
0 stars 0 forks source link

Access Violations on 64 bit Windows #474

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Please complete the following lines...

Application Version: 5.2.0 parallel
SVN Revision Number: 2102
Compile Date:9/19/2008
Operating System: windows vista business 64bit

Describe details of the issue below:

I don´t know why the program makes this error, I think that the FDS file 
is ok, but it don't complet de calculation. This error only happens when I 
run the parallel version.

regards,
Iker 

Original issue reported on code.google.com by ikke...@gmail.com on 24 Sep 2008 at 8:55

Attachments:

GoogleCodeExporter commented 9 years ago
I Also run some tests with differenta cases that crasehd with the previous 
win64 
version. They all work fine. It seems that the problem has bin fixed.
Thank you all!

Mattia

Original comment by mattia.f...@gmail.com on 16 Oct 2008 at 12:56

GoogleCodeExporter commented 9 years ago
I will mark this case as "Fixed" but I think that you are all correct -- we 
have not 
heard the last of this. In the years to come, 64 bit OS will become the norm 
and 
hopefully we will sort out all these issues. I am a bit irritated by Intel's 
decision to use different compiler options under 32 and 64 bit. Do they think 
that 
we are all computer specialists? We just want to do our calculations as 
efficiently 
as possible. I am searching for, and maybe it is published somewhere, the 
recommended compiler options for a typical release build. I suppose that this 
is 
what the defaults are supposed to be, but I notice my Makefile options line 
getting 
longer and longer.

Thanks to you all for your tenacity -- I know that it is frustrating to fuss 
with 
this sort of thing. It is even worse when 32 bit and debug modes work, but 64 
bit 
release mode does not.

Original comment by mcgra...@gmail.com on 16 Oct 2008 at 1:22

GoogleCodeExporter commented 9 years ago
I am checking this flag on the OS X platform as well with the 5.2.3 build.
If it works, I will add it into the OS X make targets.

Original comment by bryanwkl...@gmail.com on 21 Oct 2008 at 11:49

GoogleCodeExporter commented 9 years ago
Unfortunaltely I got a problem with the new compiled version (win64bit FDS 
5.2.3), 
i.e. a numerical instability with a file that previous ran (fds version before 
the 
compiler option /fltconsistency).
Interesting is that FDS didn't crashed, but just gently stopped the computation.
I got the same problem also with the previous 64bit windows version (SVN 2485).
At the moment the case is running on a 32bit cluster and I assume that it will 
work 
fine (I will inform you).

Mattia

Original comment by mattia.f...@gmail.com on 22 Oct 2008 at 3:15

GoogleCodeExporter commented 9 years ago
The mentioned case was tested on a 32bit cluster where it ran correctly. My 
conclusion is that the compilation problems with the win64 are not all solved, 
so 
the case is not really "fixed", sorry.

At the moment I am little bit worried, since we have a cluster which doesn't 
run 
correctly and on the other hand we have simulations to be done.

A temporary solution could be to install the win-32bit version, however there 
is 
some conflict with the MPICH2 64bit as mentined in the Comment 39.

Does anyone have a better idea/solution?

Mattia

Original comment by mattia.f...@gmail.com on 28 Oct 2008 at 5:11

GoogleCodeExporter commented 9 years ago
Remind me what version of FDS you are using, and who compiled it -- you or us. 
Also, 
do you have the SVN number and the error statement?

Original comment by mcgra...@gmail.com on 28 Oct 2008 at 5:58

GoogleCodeExporter commented 9 years ago
The version is the latest one compiled by you (for win 64bit)

Compilation Date : Fri, 17 Oct 2008
Version          : 5.2.3 Parallel
SVN Revision No. : 2514 

the error statment was just in a seprate window: "numerical instability" - the 
time 
step became extrem small (0.00004 s) and fds stopped to compute the same way as 
you 
use the .stop file. Not really a crash, but an impossibility to compute furhter.

FDS computed "normal" up to 366 s with a constant time step of 0.01063 s, 
afterwards 
the time step began to increse. Note that we activate the ventilation at 60 s 
and 
then there is only an exponential fire increase (from 0 to 14 MW within the 
first 
1200 s).

mattia

Original comment by mattia.f...@gmail.com on 29 Oct 2008 at 8:18

GoogleCodeExporter commented 9 years ago
If you run the same case with the 32 bit Parallel version, SVN 2514, do you get 
the 
same result? A numerical instability occurs when spurious perturbations in the 
velocity field lead to an endless spiral of reduced time steps. Take a look at 
the 
last set of PLOT3D files that were produced at the time the calculation 
stopped. Do 
you see unusual flow vectors?

Original comment by mcgra...@gmail.com on 29 Oct 2008 at 11:43

GoogleCodeExporter commented 9 years ago
The 32bit version is the:

Compilation Date : Wed, 30 Jul 2008
Version          : 5.2.0 Parallel
SVN Revision No. : 2087

We gave the computation to be performed at Seneca College of Applied Arts & 
Technology where they have a 32bit cluster.

Original comment by mattia.f...@gmail.com on 29 Oct 2008 at 1:16

GoogleCodeExporter commented 9 years ago
We must test the exact same version, 32 vs 64 bit. Otherwise, it is possible 
that 
your 64 bit version is slightly different and it causes the numerical 
instability.

Original comment by mcgra...@gmail.com on 29 Oct 2008 at 1:25

GoogleCodeExporter commented 9 years ago
Well, then I have a problem. I cannot decide which 32bit version to use, since 
the 
cluster is not our cluster.
Our machine is 64bit, and unfortunately the FDS-32bit parallel version doesn't 
work 
with the 64bit MPICH2 (see comment 36 above).
I could downgrade the FDS-64bit version to the 5.2.0 and test it again, if you 
think 
that it is worth it.

Original comment by mattia.f...@gmail.com on 29 Oct 2008 at 1:52

GoogleCodeExporter commented 9 years ago
If you post the input file, I can run it on my 32 bit linux cluster with 5.2.3. 
It 
is possible that we introduced a bug that is causing your problem. How long 
does the 
case take to run until you see the problem?

Original comment by mcgra...@gmail.com on 29 Oct 2008 at 1:57

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Kevin,

the case had the problem at about 504 s. The time step began to increase from 
the "normal" value 0.0062 s to the stop value 0.00003 s(disregard the values in 
comment 57, they refer to another case).

Since the file is a client simulation (confidential) could I send you it per 
e-mail 
instead of posting it?

Original comment by mattia.f...@gmail.com on 29 Oct 2008 at 3:45

GoogleCodeExporter commented 9 years ago
Yes, send it. I meant, how much real clock time to reach the problem?

Original comment by mcgra...@gmail.com on 29 Oct 2008 at 3:57

GoogleCodeExporter commented 9 years ago
about 5843 min (97 hr) on a 8 core machine. the file has 36 meshes If you have 
enough CPU it will be deninitely faster ( 10 hr?).

Note that it also stopped at the same point with the previous version ( 
Compilation 
Date : Tue, 14 Oct 2008 / Version: 5.2.1 Parallel / SVN Revision No. : 2483)

Original comment by mattia.f...@gmail.com on 29 Oct 2008 at 4:11

GoogleCodeExporter commented 9 years ago
Before doing this, I want you to confirm that there has been a numerical 
instability 
by looking at the Plot3D files. This job would tie up a significant fraction of 
my 
computing resources. I also want confirmation that a 32 bit compilation of the 
exact 
same SVN number does work.

Original comment by mcgra...@gmail.com on 29 Oct 2008 at 4:16

GoogleCodeExporter commented 9 years ago
Ok, I am now out of the office, I will have a look at the plot 3D tomorrow.

I cannot confirm you that the same 32-bit version works. I had the case 
simulated on 
an external cluster at the Seneca College of Applied Arts & Technology -Canada  
(Compilation Date : Wed, 30 Jul 2008/ Version: 5.2.0 Parallel/ SVN Revision No. 
: 
2087) and I of course can not force them to upgrade. 

Original comment by mattia.f...@gmail.com on 29 Oct 2008 at 4:31

GoogleCodeExporter commented 9 years ago
Interesting news:
We had a series of very similar cases to be ran. I sent them to Seneca College, 
where they have the 32bit version cluster. They computed 2 cases without 
problem 
with the Version 5.2.0 and the last 2 using the version 5.2.3. The last never 
got to 
the end, they both showed numerical instability problems.

In the meantime we modified one case so to have a greater time step (> 10 times 
greater, just by increasing the exhaust surface reducing thus the outlet air 
velocity) and it is running -faster- without problem also on our 64bit version 
passing over the problematic point.

So my conclusion is that the new version 5.2.3 has a problem with small time 
steps 
(and possibly combined with a big model?), which lead to numerical instability.

Original comment by mattia.f...@gmail.com on 30 Oct 2008 at 12:46

GoogleCodeExporter commented 9 years ago
It is difficult to make those kinds of conclusions with just a single case. 
There 
are minor changes made with each version, and sometimes a minor change can 
cause 
problems with a very particular geometry. Have you been able to identify where 
in 
the domain the instability is occurring? 

Original comment by mcgra...@gmail.com on 30 Oct 2008 at 1:57

GoogleCodeExporter commented 9 years ago
Unfortunately I am still out of the office and cannot download the results. I 
should 
be there tomorrow. We will also make a test with a reduced case containing the 
original opening and see if the small time step in a small model also has 
numerical 
instabilites. I will post again as we have new information.

Original comment by mattia.f...@gmail.com on 30 Oct 2008 at 2:37

GoogleCodeExporter commented 9 years ago
What is the current status this issue? Does the problem still appear with the 
latest
version?

Original comment by shost...@gmail.com on 21 Nov 2008 at 1:41

GoogleCodeExporter commented 9 years ago
Hello,

I am now testing the new FDS 5.2.4 Version with our previous cash-cases. The 
one 
which also crahses on Dave McGill cluster (win32 FDS version 5.2.0 and also FDS 
5.2.3) is still running, hopefully the new version as some problem fixed. I 
will 
test several cases during the next weeks and post my result again.

Mattia

PS: Sorry for the long silence but there was some urgent work to be done by us 
and 
the machine was very busy.

Original comment by mattia.f...@gmail.com on 11 Dec 2008 at 8:29

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I posted the 32 bit Windows and Linux versions of FDS 5.2.5 last week. Simo 
Hostikka 
will build and post the 64 bit version.

Original comment by mcgra...@gmail.com on 15 Dec 2008 at 9:25

GoogleCodeExporter commented 9 years ago
 Fire Dynamics Simulator

 Compilation Date : Thu, 16 Oct 2008
 Version          : 5.2.3 Parallel
 SVN Revision No. : 2514

(Note: My own 64-bit compile using the Intel compiler)

I had a set of seemingly similar experiences and thought I would post the 
additional 
information in case it would assist your analysis.

 Time Step:   3300,    Simulation Time:    309.74 s
p4_5248:  p4_error: interrupt SIGSEGV: 11

The last diagnostic output was:
       Time Step    3300   December 16, 2008  09:07:07
       ----------------------------------------------
       Mesh    1, Cycle    3300
       CPU/step:     3.982 s, Total CPU:      3.65 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.20E+00 at ( 81, 72, 55)
       Max divergence:  0.29E-02 at ( 79, 71, 59)
       Min divergence: -0.26E-02 at ( 79, 72, 59)
       Radiation Loss to Boundaries:         0.105 kW
       Mesh    2, Cycle    3300
       CPU/step:     3.989 s, Total CPU:      3.68 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.36E+00 at ( 73, 22, 51)
       Max divergence:  0.25E-02 at ( 13,  2, 59)
       Min divergence: -0.39E-02 at ( 79, 34, 48)
       Radiation Loss to Boundaries:         0.224 kW
       Mesh    3, Cycle    3300
       CPU/step:     4.006 s, Total CPU:      3.68 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.32E+00 at ( 28, 72, 54)
       Max divergence:  0.78E-02 at ( 11, 72, 59)
       Min divergence: -0.68E-02 at ( 52, 72, 58)
       Radiation Loss to Boundaries:         0.059 kW
       Mesh    4, Cycle    3300
       CPU/step:     4.093 s, Total CPU:      3.73 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.88E+00 at ( 35, 16, 45)
       Max divergence:  0.88E-01 at ( 33, 14, 20)
       Min divergence: -0.37E-01 at ( 40, 20, 27)
       Total Heat Release Rate:            210.645 kW
       Radiation Loss to Boundaries:        73.284 kW
       Mesh    5, Cycle    3300
       CPU/step:     4.039 s, Total CPU:      3.73 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.23E+00 at ( 30, 72, 56)
       Max divergence:  0.34E-02 at ( 31, 72, 56)
       Min divergence: -0.44E-02 at ( 31, 71, 58)
       Radiation Loss to Boundaries:         0.100 kW
       Mesh    6, Cycle    3300
       CPU/step:     4.114 s, Total CPU:      3.79 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.37E+00 at (  0, 16, 52)
       Max divergence:  0.48E-02 at ( 30,  2, 59)
       Min divergence: -0.47E-02 at ( 31,  2, 58)
       Radiation Loss to Boundaries:         0.157 kW
       Mesh    7, Cycle    3300
       CPU/step:     3.263 s, Total CPU:      2.93 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.12E+00 at (  6, 12, 13)
       Max divergence:  0.82E-05 at ( 70, 72,  8)
       Min divergence: -0.82E-05 at (  6, 12, 26)
       Mesh    8, Cycle    3300
       CPU/step:     3.299 s, Total CPU:      3.01 hr
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.11E+00 at ( 69,  0, 17)
       Max divergence:  0.79E-05 at (  6, 61, 24)
       Min divergence: -0.79E-05 at (  6, 61, 26)
       Mesh    9, Cycle    3300
       CPU/step:     0.959 s, Total CPU:     51.23 min
       Time step:  0.07202 s, Total time:   309.74 s
       Max CFL number:  0.31E+00 at (152,  9, 17)
       Max divergence:  0.64E-02 at (156,  9,  4)
       Min divergence: -0.36E-02 at (155,  8,  3)
       Radiation Loss to Boundaries:         0.061 kW

I had an 8 mesh model and wanted to add a small piece to the top partly 
covering 6 
of the 8 meshes to visualize additional flow information regarding effluent 
leaving 
the domain.  I reduced two of the 8 domains to make "computational room" on my 
8 CPU 
machine to avoid greatly adversely effecting my run time.

I used smokeview to determine where I wanted to start and stop the mesh using 
the 
grid number (IJK) and location (XYZ) to match up the grids.  When I ran the job 
it 
would start and run for some time without difficulty until it died with a 
similar 
error.  Smokeview treated the grids as if they were matched up.

So, I was wondering how the grid match-up logic worked, which is supposed to 
enforce 
the exact match?  If the grids are extremely close (depending of course on how 
that 
logic works could a grid pass that test, but fail during some other 
calculation), 
but not exact, might it be possible for the code to generate an overflow or 
underflow, which, I believe, can sometimes lead to this SIGSEGV error?

I have not tried any of the stack related issues, because I did not see 
evidence of 
that problem while monitoring the run, but I could possibly do some work in 
that 
area.  See also, 
http://software.intel.com/en-us/forums/intel-fortran-compiler-for-
linux-and-mac-os-x/topic/57110.

I apologize in advance if this is not assistive, but that was the only change I 
really made that caused this case to not run.

I have returned to my previous setup and, for now, skipped the additional 
visualization I was seeking and have not run on the newer codes due to a desire 
for 
consistency with previous runs.  After my run completes, I may try to do the 
additional mesh making sure that I exactly match the existing grid (I can do 
this, 
but it will cause me to make a larger "capping" grid than I want to make at 
this 
time).

Christopher

Original comment by woodfire...@gmail.com on 16 Dec 2008 at 9:51

GoogleCodeExporter commented 9 years ago
PS: this is 64-bit under linux.

Original comment by woodfire...@gmail.com on 16 Dec 2008 at 9:52

GoogleCodeExporter commented 9 years ago
The test for mesh alignment occurs before the time stepping starts. Because 
your 
case ran for a long time, I don't think that this is related to the mesh 
alignment. 
Can you run this case successfully on a 32 bit machine?

Original comment by mcgra...@gmail.com on 16 Dec 2008 at 10:10

GoogleCodeExporter commented 9 years ago
Unfortunately, I do not have an appropriate platform upon which I might perform 
such 
a test.

Original comment by woodfire...@gmail.com on 19 Dec 2008 at 7:10

GoogleCodeExporter commented 9 years ago
Does the job fail at the same time in the simulation if you were to run it 
again?

Original comment by mcgra...@gmail.com on 19 Dec 2008 at 7:15

GoogleCodeExporter commented 9 years ago
To the extent that I recall the previous runs, the run failed at the same time. 

There are no changes occurring at that time.  In other words, other than a 
growing t-
squared fire, nothing in the domain opens, nothing closes, nothing turns on or 
off, 
and there are no forced changes in flow.  It is possible that this would be the 
first time flow went into the new mesh to which I referred, but to be honest I 
have 
not examined that scenario.  It is just a guess based upon the relationships of 
the 
meshes.

To be honest, I just had a gut feeling that there was a relationship between 
the 
failure in my problem and the sound of the discussion above.  This sense, in 
turn, 
was based upon the extremely limited changes that I had made to the input file 
from 
the previously running file, my sense of how I selected certain mesh parameters 
based upon smokeview output (because smokeview rounds the mesh location 
numbers) and 
my sense of the likelihood that the failure time was near the time at which 
flow 
would have moved from the previously existing mesh to the new (added) mesh.  My 
previous software experience in trying to sniff out intermittent or difficult 
to 
exactly reproduce problems also suggested that this information may be 
assistive in 
your search.  So, I am not trying to waste your time with something that does 
not 
assist you in resolving another problem.

Eventually, I will probably retry my attempt to add the additional mesh through 
a 
more exacting grid generation process (on my side that is -- I'm not talking 
about 
FDS's process) and then see if I can get it to run that way.  That might also 
help 
you if I am able to isolate the problem that way.  Unfortunately, however, all 
of my 
current computational resources are tasked and I will probably not be able to 
get a 
free platform for a week or so to do a substantial rerun.

Original comment by woodfire...@gmail.com on 19 Dec 2008 at 7:37

GoogleCodeExporter commented 9 years ago
Retry the case with the source code for 5.2.5 (SVN 2828), or even with the 
latest 
version in the repository. If it fails again, we'll try it here.

Original comment by mcgra...@gmail.com on 19 Dec 2008 at 7:47

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hello,

we finally managed to run all the cases which crashed under the previous 5.2.3 
version (took some weeks for 
run them all).
The new compiled win64 FDS 5.2.4 version never crashed. All the simulations 
went through until the end.

There was just an Error message (ERROR 103) written every time before the first 
timestep line, however 
apparently 
without effect on the computations.

It seems that the chosen compiler options are correct. I just noticed that a 
newer version was released (win 64 
FDS 5.2.5), was it compiled in the same manner? Because I would like to avoid 
retest the all the cases...

So, from our point of view the problems at the bais of this issue have been 
solved. Thank you

Mattia

Original comment by mattia.f...@gmail.com on 14 Jan 2009 at 7:15

GoogleCodeExporter commented 9 years ago
Closing the issue. Thanks Mattia!

Original comment by shost...@gmail.com on 9 Mar 2009 at 12:42