What does "Runtime(s3)" and "Runtime(filter) Output" mean in results.txt?

HDFGroup / datacontainer

Data Container Study

Other

8 stars 1 forks source link

What does "Runtime(s3)" and "Runtime(filter) Output" mean in results.txt? #23

Closed hyoklee closed 8 years ago

hyoklee commented 8 years ago

How should I interpret time in results.txt?

I'm asking this because I just get a bunch of 'No space left no device' errors t when I run ncep_summary.sh for all files on 86 clusters.

Invoked as: /home/ubuntu/s3cmd/s3cmd -c /home/ubuntu/config/s3_griffin.cfg ge\
t s3://hdfdata/ncep3/GSSTF_NCEP.3.2005.03.23.he5 /home/ubuntu/s3/hdfdata/ncep\
3/GSSTF_NCEP.3.2005.03.23.he5
Problem: IOError: [Errno 28] No space left on device
S3cmd:   1.6.0+
python:   2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 17:02:03)

jreadey commented 8 years ago

Did you have any other data loaded on the engines from earlier runs? With 86 engines there should be more than enough space.

ghost commented 8 years ago

You need to put "-c whatever" as a cmd argument to summary.py in that script.

hyoklee commented 8 years ago

@jreadey I don't think so unless your or @aleksandar-thg ran jobs on 172.17.192.9 and did not clean up. @ajelenak-thg Can you update documentation? I don't know what you're talking about because jobs/ncep_summary.sh did not ask any input.

hyoklee commented 8 years ago

Why does ncep_summary.sh download data into ipcontroller machine and make disk full?

(py34)ubuntu@ipcontroller:~/s3/hdfdata/ncep3$ ls -aslt
total 6358820
    0 -rw-rw-r-- 1 ubuntu ubuntu        0 Nov 16 19:14 GSSTF_NCEP.3.1996.02.1\
6.he5
   96 drwxrwxr-x 2 ubuntu ubuntu    98304 Nov 16 19:14 .
    0 -rw-rw-r-- 1 ubuntu ubuntu        0 Nov 16 19:14 GSSTF_NCEP.3.1995.06.0\
6.he5
    0 -rw-rw-r-- 1 ubuntu ubuntu        0 Nov 16 19:13 GSSTF_NCEP.3.1997.02.0\
6.he5

jreadey commented 8 years ago

You only have ~5GB of free space and are trying to load a 122GB dataset on it.

You'll need either a larger instance type or use the -c option to spread the data across more nodes.

hyoklee commented 8 years ago

So should I add -c 86 for 86 instances like below in the shell script?

python summary.py --input ../jobs/ncep_files_h5py.txt --path /HDFEOS/GRIDS/NC\
EP/Data\ Fields/Tair_2m -c 86

jreadey commented 8 years ago

The -c is just a flag. "-c 1" works equally well. The number of engines is determined by the run_engine.sh script.

hyoklee commented 8 years ago

Is '--c 1' (two dashes) or '-c 1' (one dash)?

jreadey commented 8 years ago

Either "- c1" or "--cluster 1"

hyoklee commented 8 years ago

How did you measure time for cluster testing and got '7s'?

I used the 'time' command and I got the following result for h5py style chunking algorithm:

real    0m16.839s
user    0m11.208s
sys     0m0.599s

hyoklee commented 8 years ago

I see @ajelenak-thg added codes for printing time in summary.py.