Yinan-Scott-Shi / fds-smv

Automatically exported from code.google.com/p/fds-smv
0 stars 0 forks source link

Periodic breaks in usage of processor #542

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Application Versions :
- FDS 5.2.4_2651 32 bits compiled 11/11/2008 OS(Windows XPpro v5.1 SP3 + 
RAM 1 047 952 KB)
- FDS 5.0.0_721 32 bits compiled 01/10/2007 OS(Windows XPpro v5.1 SP2 + 
RAM 2 029 932 KB) 
- FDS 5.2.0_2102 64 bits compiled 01/08/2008 OS (Windows XPpro X64 v5.2 
SP1 + RAM 8 320 880 KB) 

Describe details of the issue below:
When launched, FDS works well, using 100% of the CPU of the processor it 
is affected to. Nevertheless, after a while, the CPU usage begins to show 
some "holes", as shown in the attached UCusage.gif picture. After a 
torough assessment, done on the attached example (see attached  file 
container.fds) it appears that :
- when launched in a blank folder, the problem appears when Time Step 
reaches 200
- when launched in a folder where the same calculation has been previously 
launched and stopped (obs : without restart option, which means the files 
are simply overwritten), the problem even appears when Time Step reaches 
only 10, and occurs each 10 time steps (this means TS=20, TS=30, TS=40 and 
so on)

The importance of that factor seems to be a little bit dependant of the 
priority affected to the process : a process with a low priority have been 
found to show less "holes" in CPU usage, but this has not been strictly 
proven.

Removing some devices at the end of the example (there are 12 SLCF and 4 
BNDF) it appears that the problem occurs when there are a minimum amount 
of output data to be produced. It has been found to occur at about 7 SLCF, 
but it depends which ones are requested. For example, in the given 
example, with all 4 BNDF :
- the problem does not occur with the 6 first SLCF
- there is a glitch at TS = 200 with the 7 first SLCF
- the problem is completely present with the 8 first SLCF
- the problem does not occur at all with the 8 last SLCF

This assessment has been done on the first described configuration. I 
guess this would vary a little bit from one configuration to another. 
Nevertheless, it has been found cases where the problem is heavy : the CPU 
usage is restricted to 0% most of the time, except sometimes when there is 
a sudden rise to 100%, then it falls back down to 0%

I guess this may have something related with the usage of memory, or 
something alike. The biggest issue in this is that even on workstations 
working 24 hours a day, the computations are very low due to the poor 
usage of the available CPU.

Original issue reported on code.google.com by franck.d...@lne.fr on 19 Nov 2008 at 4:15

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry, the UCusage.gif file was not attached. Here it is.

Original comment by franck.d...@lne.fr on 19 Nov 2008 at 4:17

Attachments:

GoogleCodeExporter commented 9 years ago
Etrange. I'll take a look at it.

Original comment by mcgra...@gmail.com on 19 Nov 2008 at 4:47

GoogleCodeExporter commented 9 years ago
10 time steps is important. Every 10 time steps, FDS closes and then reopens 
all the 
output files. This forces the operating system to flush the contents of the 
buffers 
in which data is stored before being written to disk. It seems that you are 
seeing 
significant performance deficiencies because of this. As a test, set 
FLUSH_FILE_BUFFERS=.FALSE. on the DUMP line (and remember to use only one DUMP 
line). On my dual processor Windows XP laptop, the input file container.fds 
uses 
about 50% of the "CPU Usage" -- I assume because only one processor is working 
on 
the job. But I do not see the dramatic spikes as you do. Is the hard disk 
remote 
from the CPU?

Original comment by mcgra...@gmail.com on 19 Nov 2008 at 5:02

GoogleCodeExporter commented 9 years ago
I've definitely noticed in the past that hard drive speed can make a big 
difference 
in this kind of case...as well as amount of memory and network speed.

Original comment by jamie.l...@gmail.com on 19 Nov 2008 at 5:59

GoogleCodeExporter commented 9 years ago
OK -- the reason why we close and open files is to flush the data so that we 
can 
easily check the run in Smokeview. If you do not flush, often the Smokeview 
rendering falls significantly behind the FDS calc itself. If you know that 
you're 
not going to need to view the case during the run (or even if you're willing to 
accept the delay), set FLUSH_FILE_BUFFERS=.FALSE. 

It might also save on wear and tear of the disk.

Original comment by mcgra...@gmail.com on 19 Nov 2008 at 6:29

GoogleCodeExporter commented 9 years ago
Before going on, I'd like to add a precision about what I called "100% of the 
CPU 
usage for the CPU it is affected to" : this means, on a biprocessor, it is 
affected 
to one of the two processors, leading to a global usage of 50%. On a 
quadriprocessor 
(as on our workstations), the CPU usage is consequently 25% for each launched 
FDS 
case. Therefore, the 50% usage on your biprocessor laptop sounds nice to me.

I have tested the FLUSH_FILE_BUFFERS = .FALSE. option and it does solve the 
spikes 
problem. Nevertheless, this has to be used only for cases that does not need 
any 
visualisation under run, which is seldom the case : we often wait for the 
results, 
as tendencies, confirmation of hypothesis, and so on. So, this solution is 
useful, 
but has a drawback.
Do you have any estimation or rule of thumb about how often the buffers are 
flushed 
when the option is set to false (I suppose it is somehow linked to the quantity 
of 
available RAM) ??

In all the tested situations (on a single processor laptop, on a biprocessor 
PC, on 
a quadriprocessor workstation), the hard drive were local. There was no need to 
send 
the data through any network. So, I guess the performance may be impacted by 
the 
model of hard drive (is it a fast one or not ?).

On my laptop, the HD is a slow one (A fujitsu 5400 RPM, see 
http://193.128.183.41/home/v3__product.asp?pid=447 for complete specifications) 
and 
the available RAM is about 1Gb. Nevertheless, this laptop did not encounter any 
kind 
of spike when running FDS4. I'll have a performance test with FDS4 to control 
that 
specific point, and let you know the result

Original comment by franck.d...@lne.fr on 20 Nov 2008 at 10:49

GoogleCodeExporter commented 9 years ago
In FDS 4, we used a non-standard call, FLUSH, to flush the buffers. Because it 
is 
non-standard Fortran, we had compilation problems on various machines. So in 
FDS 5, 
we adopted the close-open approach. Is there another machine you could test 
your 
simple case on? 

Original comment by mcgra...@gmail.com on 20 Nov 2008 at 1:26

GoogleCodeExporter commented 9 years ago
take a look at hdtune available at 
http://www.hdtune.com/download.html
Its software for benchmark and measuring the "health" of your hard drives.  I
downloaded the free version.

Here is a summary of results when I run it on my Dell Precision M90 laptop.

Info Tab:  all boxes except for Power-up in Standby
   SATA II drive, 149.1 GB capacity, 8192KB buffer
Benchmark Tab:
    transfer rate min=3.5 MB/sec, max=57.7 MB/sec, avg=31.9 MB/sec
    access time=14.7 ms, Burst Rate=69.6 MB/sec

If you can, let us know what the results are for your computer, ie is it a lot
slower.  These results are for a laptop hard drive which I expect to be slower 
than a
good hard drive for a desktop computer.  

Original comment by gfor...@gmail.com on 20 Nov 2008 at 2:07

GoogleCodeExporter commented 9 years ago
THere is something else you can look at with respect to your hard drive.

How badly fragmented is it?  If you have never de-fragmented it and your disk is
relatively full, then it is a good bet that your drive is heavily fragmented.  
This
could have  a negative effect on performance.  

Original comment by gfor...@gmail.com on 20 Nov 2008 at 2:12

GoogleCodeExporter commented 9 years ago
>Kevin on C7
The case has been tested on at least 3 completely differents machines (see 
initial 
post)
- my laptop (FDS 5.2.4_2651 32 bits compiled 11/11/2008 [Intel Pentium M 1.73 
GHz 
1GB RAM] Win XPpro v5.1 SP3)
- a desktop (FDS 5.0.0_721 32 bits compiled 01/10/2007 [AMD Athlon 64 X2 Dual 
Core 
5000+ 2.61 GHz 1.93 GB] Win XPpro v5.1 SP2) 
- a DELL workstation (FDS 5.2.0_2102 64 bits compiled 01/08/2008 [Intel Xeon 
CPU 
3.60 GHz 7.93 GB RAM] Win XPpro X64 v5.2 SP1)
As you see, the configurations are very different. Attached is the CPU usage 
for the 
Workstation, with 3 different cases running. Two cases run on the CPU0 and one 
on 
CPU1. This looks like the workstation just sleeps between some spikes of 
activity...

Other workstations (Intel Xeon quadriprocessor CPU 5160 @ 3.00 Ghz 16 GB RAM) 
have 
been tested with roughly the same symptoms.

>Glenn on C9
My Hard Drive is only 52% full.
In order to test, I defragmented it, then launched the case again, without 
FLUSH_BUFFER_FILES = .FALSE. of course. The problem is still the same. Each 10 
time 
steps, the CPU seems to freeze a while.

Original comment by franck.d...@lne.fr on 20 Nov 2008 at 3:43

Attachments:

GoogleCodeExporter commented 9 years ago
Just tried running this on my Dell T3400 (Win XP Intel Core 2 Quad).  If I 
generate 
graphs for each core, then I get an average utilization of 25 % but the process 
bounces from core to core in a somewhat random process (i.e. looks like the 
Workstation.gif only with four plots rather than two).  If I set the View 
option to 
a single graph then I get a rock steady 25 % utilization (i.e. 1/4 cores) that 
shows 
now periodic dips.  I would say this is an issue with that particular machine.  
Some 
other process is running that periodically wakes up and uses CPU time.  Do you 
have 
virus software on the machine scanning file writes for FDS?  (most of the 
packagles 
let you exclude specific programs/directories).  Have you scanned the machine 
for 
malware?

Original comment by drjfloyd on 20 Nov 2008 at 4:58

GoogleCodeExporter commented 9 years ago
I believe that the problem is linked to the closing/opening of files because 
the 
dips in CPU usage disappear when FLUSH_FILE_BUFFERS=.FALSE.

I think a good solution is to create DT_FLUSH and "flush" every DT_FLUSH 
seconds 
rather than every 10 time steps. That way, one could set the value to, say, 10 
s, 
and have only a small impact on CPU time. What do you all think?

Original comment by mcgra...@gmail.com on 20 Nov 2008 at 5:07

GoogleCodeExporter commented 9 years ago
DT_FLUSH isn't a bad idea, though we should limit flushing to either DT_FLUSH 
or 
DT_DUMP.  That is if FDS is taking 100 s between output dumps and DT_FLUSH is 
10 s 
we shouldn't be wasting time opening and closing files 10 times when nothing 
new has 
been written to the buffer.

Original comment by drjfloyd on 20 Nov 2008 at 6:23

GoogleCodeExporter commented 9 years ago
A kind of DT_FLUSH parameter was in my mind too. I think it would patch the 
problem 
nicely and easily. Nevertheless, I remain very interested about a good 
explanation 
about the slowness of the close-open operation. Maybe this could lead to 
a "Optimization of calculation" section in the User's Guide.

I don't believe an antivirus could hold files pending that long, except if bad 
engineered. However, our antivirus is a corporate one, and I (fortunately ?) 
don't 
have any access right to change settings.

Original comment by franck.d...@lne.fr on 21 Nov 2008 at 9:43

GoogleCodeExporter commented 9 years ago
Hi, I just run the case twice, with "empty" folder and with old outputs.
I did not have any spikes in the CPU usage. My computer is four years old
DELL laptop with one processor. I did run on the C-hard drive, i.e., the
drive, which is inside the laptop. This hard drive is crypted (the whole drive)
and a virus software (F-Secure) is also watching the disk. 

Some details about my machine:

XP Professional, Version 2002, SP 2
Inter Pentium M proc, 1.6 GHz, 1.0 GB RAM
Performance level set "Adjust for best performance" (in "My Computer",
properties,...,"Advanced"...)

Disk Drive Data (from "device Manager"): ST94811A and "Policies" page
has a tick at "Enable write caching on the disk".
Disk is 76 % full.
Disk has not a tick at "Compress drive to save disk space"
Disk has not a tick at "Allow indexing Service to index this drive..."

TimoK

BTW, I did run the case using a tcsh shell, i.e., under Cygwin. Next I will
run the case with old files using the standard DOS "cmd" window. I like to
use Cygwin, because I have found that then you can use your computer better,
when you have a FDS job running on the background (and with nice value).
Especially the hard drive seems to work better that way than using the DOS
cmd window.

Original comment by tkorh...@gmail.com on 21 Nov 2008 at 11:25

GoogleCodeExporter commented 9 years ago
Thanks Timo.

It still might be useful to consider running the software described in Comment 
8 to 
better quantify your disk performance. I'll add DT_FLUSH.

Original comment by mcgra...@gmail.com on 21 Nov 2008 at 1:09

GoogleCodeExporter commented 9 years ago
Sorry for the delay, I was waiting for my computer department to give me 
approbation 
about the use of HD_Tune on my computer. Here are the results :

HD Tune: FUJITSU MHV2040AH Information
Firmware version : 00000096
Serial number    :         NT26T633CTAM
Capacity         : 37.3 GB (~40.0 GB)
Buffer size      : 8192 KB
Standard         : ATA/ATAPI-0
Supported mode   : UDMA Mode 5 (Ultra ATA/100)
Current mode     : UDMA Mode 5 (Ultra ATA/100)

S.M.A.R.T                    : yes
48-bit Address               : no
Read Look-Ahead              : yes
Write Cache                  : yes
Host Protected Area          : yes
Device Configuration Overlay : yes
Automatic Acoustic Management: yes
Power Management             : yes
Advanced Power Management    : yes
Power-up in Standby          : yes
Security Mode                : yes
Firmware Upgradable          : yes

Partition     : 1
Drive letter  : 
Label         : 
Capacity      : 94 MB
Usage         : 0.00%
Type          : unknown (DEh)
Bootable      : No

Partition     : 2
Drive letter  : C:\
Label         : 
Capacity      : 38060 MB
Usage         : 49.16%
Type          : NTFS
Bootable      : Yes

HD Tune: FUJITSU MHV2040AH Benchmark
Transfer Rate Minimum : 3.8 MB/sec
Transfer Rate Maximum : 34.2 MB/sec
Transfer Rate Average : 27.6 MB/sec
Access Time           : 22.0 ms
Burst Rate            : 60.4 MB/sec
CPU Usage             : 1.7%

Hope this will help. I did not use HD_Tune on our workstations, cause they are 
all 3 
running simulations right now.

Original comment by franck.d...@lne.fr on 21 Nov 2008 at 5:00

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
It appears that your disk is slightly slower, but not by as much as I would 
have 
thought, or maybe I am looking at the wrong diagnostic. In any case, I added 
and 
committed code to the repository that includes DT_FLUSH on the DUMP line. This 
parameter allows you to set the interval for the buffer flushing. The default 
value 
is (T_END-T_BEGIN)/NFRAMES, the same interval that governs most of the major 
output 
files.

I will mark the case as Fixed, but please let me know if something still seems 
unusual.

Original comment by mcgra...@gmail.com on 2 Dec 2008 at 2:15

GoogleCodeExporter commented 9 years ago
Please confirm that the new feature is working properly. Thanks.

Original comment by mcgra...@gmail.com on 15 Dec 2008 at 2:36

GoogleCodeExporter commented 9 years ago
Assume verified

Original comment by mcgra...@gmail.com on 18 Feb 2009 at 10:31

GoogleCodeExporter commented 9 years ago
Verified. Currently no more problem
Thanks for help

Original comment by franck.d...@lne.fr on 20 Mar 2009 at 12:19