Closed GoogleCodeExporter closed 9 years ago
I Also run some tests with differenta cases that crasehd with the previous
win64
version. They all work fine. It seems that the problem has bin fixed.
Thank you all!
Mattia
Original comment by mattia.f...@gmail.com
on 16 Oct 2008 at 12:56
I will mark this case as "Fixed" but I think that you are all correct -- we
have not
heard the last of this. In the years to come, 64 bit OS will become the norm
and
hopefully we will sort out all these issues. I am a bit irritated by Intel's
decision to use different compiler options under 32 and 64 bit. Do they think
that
we are all computer specialists? We just want to do our calculations as
efficiently
as possible. I am searching for, and maybe it is published somewhere, the
recommended compiler options for a typical release build. I suppose that this
is
what the defaults are supposed to be, but I notice my Makefile options line
getting
longer and longer.
Thanks to you all for your tenacity -- I know that it is frustrating to fuss
with
this sort of thing. It is even worse when 32 bit and debug modes work, but 64
bit
release mode does not.
Original comment by mcgra...@gmail.com
on 16 Oct 2008 at 1:22
I am checking this flag on the OS X platform as well with the 5.2.3 build.
If it works, I will add it into the OS X make targets.
Original comment by bryanwkl...@gmail.com
on 21 Oct 2008 at 11:49
Unfortunaltely I got a problem with the new compiled version (win64bit FDS
5.2.3),
i.e. a numerical instability with a file that previous ran (fds version before
the
compiler option /fltconsistency).
Interesting is that FDS didn't crashed, but just gently stopped the computation.
I got the same problem also with the previous 64bit windows version (SVN 2485).
At the moment the case is running on a 32bit cluster and I assume that it will
work
fine (I will inform you).
Mattia
Original comment by mattia.f...@gmail.com
on 22 Oct 2008 at 3:15
The mentioned case was tested on a 32bit cluster where it ran correctly. My
conclusion is that the compilation problems with the win64 are not all solved,
so
the case is not really "fixed", sorry.
At the moment I am little bit worried, since we have a cluster which doesn't
run
correctly and on the other hand we have simulations to be done.
A temporary solution could be to install the win-32bit version, however there
is
some conflict with the MPICH2 64bit as mentined in the Comment 39.
Does anyone have a better idea/solution?
Mattia
Original comment by mattia.f...@gmail.com
on 28 Oct 2008 at 5:11
Remind me what version of FDS you are using, and who compiled it -- you or us.
Also,
do you have the SVN number and the error statement?
Original comment by mcgra...@gmail.com
on 28 Oct 2008 at 5:58
The version is the latest one compiled by you (for win 64bit)
Compilation Date : Fri, 17 Oct 2008
Version : 5.2.3 Parallel
SVN Revision No. : 2514
the error statment was just in a seprate window: "numerical instability" - the
time
step became extrem small (0.00004 s) and fds stopped to compute the same way as
you
use the .stop file. Not really a crash, but an impossibility to compute furhter.
FDS computed "normal" up to 366 s with a constant time step of 0.01063 s,
afterwards
the time step began to increse. Note that we activate the ventilation at 60 s
and
then there is only an exponential fire increase (from 0 to 14 MW within the
first
1200 s).
mattia
Original comment by mattia.f...@gmail.com
on 29 Oct 2008 at 8:18
If you run the same case with the 32 bit Parallel version, SVN 2514, do you get
the
same result? A numerical instability occurs when spurious perturbations in the
velocity field lead to an endless spiral of reduced time steps. Take a look at
the
last set of PLOT3D files that were produced at the time the calculation
stopped. Do
you see unusual flow vectors?
Original comment by mcgra...@gmail.com
on 29 Oct 2008 at 11:43
The 32bit version is the:
Compilation Date : Wed, 30 Jul 2008
Version : 5.2.0 Parallel
SVN Revision No. : 2087
We gave the computation to be performed at Seneca College of Applied Arts &
Technology where they have a 32bit cluster.
Original comment by mattia.f...@gmail.com
on 29 Oct 2008 at 1:16
We must test the exact same version, 32 vs 64 bit. Otherwise, it is possible
that
your 64 bit version is slightly different and it causes the numerical
instability.
Original comment by mcgra...@gmail.com
on 29 Oct 2008 at 1:25
Well, then I have a problem. I cannot decide which 32bit version to use, since
the
cluster is not our cluster.
Our machine is 64bit, and unfortunately the FDS-32bit parallel version doesn't
work
with the 64bit MPICH2 (see comment 36 above).
I could downgrade the FDS-64bit version to the 5.2.0 and test it again, if you
think
that it is worth it.
Original comment by mattia.f...@gmail.com
on 29 Oct 2008 at 1:52
If you post the input file, I can run it on my 32 bit linux cluster with 5.2.3.
It
is possible that we introduced a bug that is causing your problem. How long
does the
case take to run until you see the problem?
Original comment by mcgra...@gmail.com
on 29 Oct 2008 at 1:57
[deleted comment]
Kevin,
the case had the problem at about 504 s. The time step began to increase from
the "normal" value 0.0062 s to the stop value 0.00003 s(disregard the values in
comment 57, they refer to another case).
Since the file is a client simulation (confidential) could I send you it per
e-mail
instead of posting it?
Original comment by mattia.f...@gmail.com
on 29 Oct 2008 at 3:45
Yes, send it. I meant, how much real clock time to reach the problem?
Original comment by mcgra...@gmail.com
on 29 Oct 2008 at 3:57
about 5843 min (97 hr) on a 8 core machine. the file has 36 meshes If you have
enough CPU it will be deninitely faster ( 10 hr?).
Note that it also stopped at the same point with the previous version (
Compilation
Date : Tue, 14 Oct 2008 / Version: 5.2.1 Parallel / SVN Revision No. : 2483)
Original comment by mattia.f...@gmail.com
on 29 Oct 2008 at 4:11
Before doing this, I want you to confirm that there has been a numerical
instability
by looking at the Plot3D files. This job would tie up a significant fraction of
my
computing resources. I also want confirmation that a 32 bit compilation of the
exact
same SVN number does work.
Original comment by mcgra...@gmail.com
on 29 Oct 2008 at 4:16
Ok, I am now out of the office, I will have a look at the plot 3D tomorrow.
I cannot confirm you that the same 32-bit version works. I had the case
simulated on
an external cluster at the Seneca College of Applied Arts & Technology -Canada
(Compilation Date : Wed, 30 Jul 2008/ Version: 5.2.0 Parallel/ SVN Revision No.
:
2087) and I of course can not force them to upgrade.
Original comment by mattia.f...@gmail.com
on 29 Oct 2008 at 4:31
Interesting news:
We had a series of very similar cases to be ran. I sent them to Seneca College,
where they have the 32bit version cluster. They computed 2 cases without
problem
with the Version 5.2.0 and the last 2 using the version 5.2.3. The last never
got to
the end, they both showed numerical instability problems.
In the meantime we modified one case so to have a greater time step (> 10 times
greater, just by increasing the exhaust surface reducing thus the outlet air
velocity) and it is running -faster- without problem also on our 64bit version
passing over the problematic point.
So my conclusion is that the new version 5.2.3 has a problem with small time
steps
(and possibly combined with a big model?), which lead to numerical instability.
Original comment by mattia.f...@gmail.com
on 30 Oct 2008 at 12:46
It is difficult to make those kinds of conclusions with just a single case.
There
are minor changes made with each version, and sometimes a minor change can
cause
problems with a very particular geometry. Have you been able to identify where
in
the domain the instability is occurring?
Original comment by mcgra...@gmail.com
on 30 Oct 2008 at 1:57
Unfortunately I am still out of the office and cannot download the results. I
should
be there tomorrow. We will also make a test with a reduced case containing the
original opening and see if the small time step in a small model also has
numerical
instabilites. I will post again as we have new information.
Original comment by mattia.f...@gmail.com
on 30 Oct 2008 at 2:37
What is the current status this issue? Does the problem still appear with the
latest
version?
Original comment by shost...@gmail.com
on 21 Nov 2008 at 1:41
Hello,
I am now testing the new FDS 5.2.4 Version with our previous cash-cases. The
one
which also crahses on Dave McGill cluster (win32 FDS version 5.2.0 and also FDS
5.2.3) is still running, hopefully the new version as some problem fixed. I
will
test several cases during the next weeks and post my result again.
Mattia
PS: Sorry for the long silence but there was some urgent work to be done by us
and
the machine was very busy.
Original comment by mattia.f...@gmail.com
on 11 Dec 2008 at 8:29
[deleted comment]
I posted the 32 bit Windows and Linux versions of FDS 5.2.5 last week. Simo
Hostikka
will build and post the 64 bit version.
Original comment by mcgra...@gmail.com
on 15 Dec 2008 at 9:25
Fire Dynamics Simulator
Compilation Date : Thu, 16 Oct 2008
Version : 5.2.3 Parallel
SVN Revision No. : 2514
(Note: My own 64-bit compile using the Intel compiler)
I had a set of seemingly similar experiences and thought I would post the
additional
information in case it would assist your analysis.
Time Step: 3300, Simulation Time: 309.74 s
p4_5248: p4_error: interrupt SIGSEGV: 11
The last diagnostic output was:
Time Step 3300 December 16, 2008 09:07:07
----------------------------------------------
Mesh 1, Cycle 3300
CPU/step: 3.982 s, Total CPU: 3.65 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.20E+00 at ( 81, 72, 55)
Max divergence: 0.29E-02 at ( 79, 71, 59)
Min divergence: -0.26E-02 at ( 79, 72, 59)
Radiation Loss to Boundaries: 0.105 kW
Mesh 2, Cycle 3300
CPU/step: 3.989 s, Total CPU: 3.68 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.36E+00 at ( 73, 22, 51)
Max divergence: 0.25E-02 at ( 13, 2, 59)
Min divergence: -0.39E-02 at ( 79, 34, 48)
Radiation Loss to Boundaries: 0.224 kW
Mesh 3, Cycle 3300
CPU/step: 4.006 s, Total CPU: 3.68 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.32E+00 at ( 28, 72, 54)
Max divergence: 0.78E-02 at ( 11, 72, 59)
Min divergence: -0.68E-02 at ( 52, 72, 58)
Radiation Loss to Boundaries: 0.059 kW
Mesh 4, Cycle 3300
CPU/step: 4.093 s, Total CPU: 3.73 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.88E+00 at ( 35, 16, 45)
Max divergence: 0.88E-01 at ( 33, 14, 20)
Min divergence: -0.37E-01 at ( 40, 20, 27)
Total Heat Release Rate: 210.645 kW
Radiation Loss to Boundaries: 73.284 kW
Mesh 5, Cycle 3300
CPU/step: 4.039 s, Total CPU: 3.73 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.23E+00 at ( 30, 72, 56)
Max divergence: 0.34E-02 at ( 31, 72, 56)
Min divergence: -0.44E-02 at ( 31, 71, 58)
Radiation Loss to Boundaries: 0.100 kW
Mesh 6, Cycle 3300
CPU/step: 4.114 s, Total CPU: 3.79 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.37E+00 at ( 0, 16, 52)
Max divergence: 0.48E-02 at ( 30, 2, 59)
Min divergence: -0.47E-02 at ( 31, 2, 58)
Radiation Loss to Boundaries: 0.157 kW
Mesh 7, Cycle 3300
CPU/step: 3.263 s, Total CPU: 2.93 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.12E+00 at ( 6, 12, 13)
Max divergence: 0.82E-05 at ( 70, 72, 8)
Min divergence: -0.82E-05 at ( 6, 12, 26)
Mesh 8, Cycle 3300
CPU/step: 3.299 s, Total CPU: 3.01 hr
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.11E+00 at ( 69, 0, 17)
Max divergence: 0.79E-05 at ( 6, 61, 24)
Min divergence: -0.79E-05 at ( 6, 61, 26)
Mesh 9, Cycle 3300
CPU/step: 0.959 s, Total CPU: 51.23 min
Time step: 0.07202 s, Total time: 309.74 s
Max CFL number: 0.31E+00 at (152, 9, 17)
Max divergence: 0.64E-02 at (156, 9, 4)
Min divergence: -0.36E-02 at (155, 8, 3)
Radiation Loss to Boundaries: 0.061 kW
I had an 8 mesh model and wanted to add a small piece to the top partly
covering 6
of the 8 meshes to visualize additional flow information regarding effluent
leaving
the domain. I reduced two of the 8 domains to make "computational room" on my
8 CPU
machine to avoid greatly adversely effecting my run time.
I used smokeview to determine where I wanted to start and stop the mesh using
the
grid number (IJK) and location (XYZ) to match up the grids. When I ran the job
it
would start and run for some time without difficulty until it died with a
similar
error. Smokeview treated the grids as if they were matched up.
So, I was wondering how the grid match-up logic worked, which is supposed to
enforce
the exact match? If the grids are extremely close (depending of course on how
that
logic works could a grid pass that test, but fail during some other
calculation),
but not exact, might it be possible for the code to generate an overflow or
underflow, which, I believe, can sometimes lead to this SIGSEGV error?
I have not tried any of the stack related issues, because I did not see
evidence of
that problem while monitoring the run, but I could possibly do some work in
that
area. See also,
http://software.intel.com/en-us/forums/intel-fortran-compiler-for-
linux-and-mac-os-x/topic/57110.
I apologize in advance if this is not assistive, but that was the only change I
really made that caused this case to not run.
I have returned to my previous setup and, for now, skipped the additional
visualization I was seeking and have not run on the newer codes due to a desire
for
consistency with previous runs. After my run completes, I may try to do the
additional mesh making sure that I exactly match the existing grid (I can do
this,
but it will cause me to make a larger "capping" grid than I want to make at
this
time).
Christopher
Original comment by woodfire...@gmail.com
on 16 Dec 2008 at 9:51
PS: this is 64-bit under linux.
Original comment by woodfire...@gmail.com
on 16 Dec 2008 at 9:52
The test for mesh alignment occurs before the time stepping starts. Because
your
case ran for a long time, I don't think that this is related to the mesh
alignment.
Can you run this case successfully on a 32 bit machine?
Original comment by mcgra...@gmail.com
on 16 Dec 2008 at 10:10
Unfortunately, I do not have an appropriate platform upon which I might perform
such
a test.
Original comment by woodfire...@gmail.com
on 19 Dec 2008 at 7:10
Does the job fail at the same time in the simulation if you were to run it
again?
Original comment by mcgra...@gmail.com
on 19 Dec 2008 at 7:15
To the extent that I recall the previous runs, the run failed at the same time.
There are no changes occurring at that time. In other words, other than a
growing t-
squared fire, nothing in the domain opens, nothing closes, nothing turns on or
off,
and there are no forced changes in flow. It is possible that this would be the
first time flow went into the new mesh to which I referred, but to be honest I
have
not examined that scenario. It is just a guess based upon the relationships of
the
meshes.
To be honest, I just had a gut feeling that there was a relationship between
the
failure in my problem and the sound of the discussion above. This sense, in
turn,
was based upon the extremely limited changes that I had made to the input file
from
the previously running file, my sense of how I selected certain mesh parameters
based upon smokeview output (because smokeview rounds the mesh location
numbers) and
my sense of the likelihood that the failure time was near the time at which
flow
would have moved from the previously existing mesh to the new (added) mesh. My
previous software experience in trying to sniff out intermittent or difficult
to
exactly reproduce problems also suggested that this information may be
assistive in
your search. So, I am not trying to waste your time with something that does
not
assist you in resolving another problem.
Eventually, I will probably retry my attempt to add the additional mesh through
a
more exacting grid generation process (on my side that is -- I'm not talking
about
FDS's process) and then see if I can get it to run that way. That might also
help
you if I am able to isolate the problem that way. Unfortunately, however, all
of my
current computational resources are tasked and I will probably not be able to
get a
free platform for a week or so to do a substantial rerun.
Original comment by woodfire...@gmail.com
on 19 Dec 2008 at 7:37
Retry the case with the source code for 5.2.5 (SVN 2828), or even with the
latest
version in the repository. If it fails again, we'll try it here.
Original comment by mcgra...@gmail.com
on 19 Dec 2008 at 7:47
[deleted comment]
Hello,
we finally managed to run all the cases which crashed under the previous 5.2.3
version (took some weeks for
run them all).
The new compiled win64 FDS 5.2.4 version never crashed. All the simulations
went through until the end.
There was just an Error message (ERROR 103) written every time before the first
timestep line, however
apparently
without effect on the computations.
It seems that the chosen compiler options are correct. I just noticed that a
newer version was released (win 64
FDS 5.2.5), was it compiled in the same manner? Because I would like to avoid
retest the all the cases...
So, from our point of view the problems at the bais of this issue have been
solved. Thank you
Mattia
Original comment by mattia.f...@gmail.com
on 14 Jan 2009 at 7:15
Closing the issue. Thanks Mattia!
Original comment by shost...@gmail.com
on 9 Mar 2009 at 12:42
Original issue reported on code.google.com by
ikke...@gmail.com
on 24 Sep 2008 at 8:55Attachments: