Open davidhassell opened 4 months ago
The current code will not call nc_def_var_chunking
at all if chunksizes=None
and contiguous=False
, which I would think would result in the library default chunking strategy.
I think chunking is only used be default if there is an unlimited dimension. Try this:
import netCDF4
import numpy as np
def write(**kwargs):
nc = netCDF4.Dataset('chunk.nc', 'w')
x = nc.createDimension('x', 8000)
y = nc.createDimension('y', 400)
z = nc.createDimension('z', None)
tas = nc.createVariable('tas', 'f8', ('z','y', 'x'), **kwargs)
tas[0:10,:,:] = np.random.random(32000000).reshape(10,400, 8000)
print(tas.chunking())
nc.close()
write()
[1, 200, 4000]
so even if you specify contiguous=False
you won't get chunking by default unless there is an unlimited dimension. If there is no unlimited dimension you have to specify the chunksize to get chunking.
I can see how this can be confusing since the default for the contingous kwarg is False, yet the library default is True unless there is an unlimited dimension. It does say this in the netcdf4-python docs though "Fixed size variables (with no unlimited dimension) with no compression filters are contiguous by default."
As near as I can tell, when a variable is created, it has default chunksizes computed automatically. Then, if later, nc_def_var_chunking is called, those default sizes should get overwritten.
Thanks for the background, @jswhit and @DennisHeimbigner - it's very useful.
So, not a bug then, but maybe a feature request! Could it be possible get netCDF4-python to write with the default chunking strategy a variable that has no unlimited dimensions? I guess that you don't want to change the existing API, so perhaps that could be controlled by a new keyword to createVariable?
Thanks, David
@davidhassell it is already being reported - variables with no unlimited dimension are not chunked by default (they are contiguous).
Hi @jswhit, I see that what I wrote was ambiguous - sorry! I'll try again:
I would like to create chunked variables, chunked with the netCDF default chunk sizes, that have no unlimited dimensions. As far as I can tell this is not currently possible, but would you be open to creating this option?
@davidhassell thanks for clarifying, I understand now. Since the python interface doesn't have access to the default chunking algorithm in the C library, I don't know how this would be done. I'm open to suggestions though.
a potential workaround that doesn't require having an unlimited dimension is to turn on compression (zlib=True,complevel=1
) or the fletcher checksum algorithm (fletcher32=True
).
Hello,
I have found it impossible (at v1.6.5) to get netCDF4 to write out a file with the default chunking strategy - it either writes contiguous, or with explicitly set chunksizes, but never with the default chunks.
To test this I used the following function:
and ran it as follows:
Surely it's the case that if
contiguous=False, chunksizes=None
then the netCDF default chunking strategy should be used?I found that if I changed line https://github.com/Unidata/netcdf4-python/blob/v1.6.5rel/src/netCDF4/_netCDF4.pyx#L4307 to read:
then I could get the default chunking to work as expected:
However, this might not be the best way to do things - what do you think?
Many thanks, David