HDF5 file layout and performance

Here are two comparisons of opening a file on a posix file system using h5py and pyfive:

python opening_speed.py 
File Opening Time Comparison
h5py:    0.015273
pyfive:  0.005531
Additional times:  0.000124,  0.003239

File Opening Time Comparison
h5py:    0.054081
pyfive:  0.387869
Additional times:  0.000317,  0.000853

This has almost certainly got something to do with the way the file is lain down in terms of where indexes etc go, but the performance difference is heavily exacerbated when the file is on S3 ... it would be good to have the capability to diagnose this sort of thing. Could we modify pyfive to provide a "layout diagnostic view"?

The additional times are

h3 = time.time()
v = f2['var']
d = v._dataobjects
h4 = time.time()
d._get_chunk_addresses()
h5 = time.time()

h4-h3 and h5-h4, where f2 is the open pyfive file instance. It suggests the b-tree read itself is very fast.

NCAS-CMS / pyfive

HDF5 file layout and performance #3