This has almost certainly got something to do with the way the file is lain down in terms of where indexes etc go, but the performance difference is heavily exacerbated when the file is on S3 ... it would be good to have the capability to diagnose this sort of thing. Could we modify pyfive to provide a "layout diagnostic view"?
The additional times are
h3 = time.time()
v = f2['var']
d = v._dataobjects
h4 = time.time()
d._get_chunk_addresses()
h5 = time.time()
h4-h3 and h5-h4, where f2 is the open pyfive file instance. It suggests the b-tree read itself is very fast.
Here are two comparisons of opening a file on a posix file system using h5py and pyfive:
This has almost certainly got something to do with the way the file is lain down in terms of where indexes etc go, but the performance difference is heavily exacerbated when the file is on S3 ... it would be good to have the capability to diagnose this sort of thing. Could we modify pyfive to provide a "layout diagnostic view"?
The additional times are
h4-h3
andh5-h4
, wheref2
is the open pyfive file instance. It suggests the b-tree read itself is very fast.