Closed hyoklee closed 8 years ago
Did you have any other data loaded on the engines from earlier runs? With 86 engines there should be more than enough space.
You need to put "-c whatever" as a cmd argument to summary.py in that script.
@jreadey I don't think so unless your or @aleksandar-thg ran jobs on 172.17.192.9 and did not clean up. @ajelenak-thg Can you update documentation? I don't know what you're talking about because jobs/ncep_summary.sh did not ask any input.
Why does ncep_summary.sh download data into ipcontroller machine and make disk full?
(py34)ubuntu@ipcontroller:~/s3/hdfdata/ncep3$ ls -aslt
total 6358820
0 -rw-rw-r-- 1 ubuntu ubuntu 0 Nov 16 19:14 GSSTF_NCEP.3.1996.02.1\
6.he5
96 drwxrwxr-x 2 ubuntu ubuntu 98304 Nov 16 19:14 .
0 -rw-rw-r-- 1 ubuntu ubuntu 0 Nov 16 19:14 GSSTF_NCEP.3.1995.06.0\
6.he5
0 -rw-rw-r-- 1 ubuntu ubuntu 0 Nov 16 19:13 GSSTF_NCEP.3.1997.02.0\
6.he5
You only have ~5GB of free space and are trying to load a 122GB dataset on it.
You'll need either a larger instance type or use the -c option to spread the data across more nodes.
So should I add -c 86 for 86 instances like below in the shell script?
python summary.py --input ../jobs/ncep_files_h5py.txt --path /HDFEOS/GRIDS/NC\
EP/Data\ Fields/Tair_2m -c 86
The -c is just a flag. "-c 1" works equally well. The number of engines is determined by the run_engine.sh script.
Is '--c 1' (two dashes) or '-c 1' (one dash)?
Either "- c1" or "--cluster 1"
How did you measure time for cluster testing and got '7s'?
I used the 'time' command and I got the following result for h5py style chunking algorithm:
real 0m16.839s
user 0m11.208s
sys 0m0.599s
I see @ajelenak-thg added codes for printing time in summary.py.
How should I interpret time in results.txt?
I'm asking this because I just get a bunch of 'No space left no device' errors t when I run ncep_summary.sh for all files on 86 clusters.