NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
126 stars 19 forks source link

hdf5 chunk defaults and handling #781

Open bnlawrence opened 5 months ago

bnlawrence commented 5 months ago

The current behaviour when reading a chunked file is somewhat surprising (to me). If one reads this variable:

float UM_m01s02i205_vn1106(time, latitude, longitude) ;
        # skip uninteresting attributes for this issue
        UM_m01s02i205_vn1106:_Storage = "chunked" ;
        UM_m01s02i205_vn1106:_ChunkSizes = 1, 1920, 2560 ;

I see the following unexpected result:

In [30]: g = cf.read('double-chunking-testc.nc')[0]
In [31]: g.data.nc_hdf5_chunksizes()
Out[31]: ()

This is not a bug, insofar as it is the expected behaviour of the code - by construction cf-python currently doesn't remember HDF chunksizes from the read.

Should it? If so, it could be done, possibly with certain caveats on when that's a sensible thing to do, and it may well forget them when certain operations are applied (e.g. when aggregating files with different HDF chunks, when subspacing, when adding/removing/transposing dimensions, etc.).

Another V4.0 issue!