NetCDF3 Variables with no _FillValue don't return masked arrays

Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library

http://unidata.github.io/netcdf4-python

MIT License

754 stars 262 forks source link

NetCDF3 Variables with no _FillValue don't return masked arrays #725

Open djkirkham opened 7 years ago

djkirkham commented 7 years ago

Accessing data from a variable with no explicit _FillValue in a netCDF3 Dataset returns a filled ndarray rather than a masked array:

>>> import numpy as np
>>> import netCDF4
>>> print netCDF4.__version__
1.3.0
>>> ds = netCDF4.Dataset('x.nc', 'w', format='NETCDF3_CLASSIC')
>>> d = ds.createDimension('x', 3)
>>> v = ds.createVariable('x', 'i', 'x')
>>> m = np.ma.masked_equal([1,2,3], 2)
>>> v[:] = m
>>> print v[:]
[          1 -2147483647           3]

Is this intended behaviour? Calling ncdump on the dataset shows the points as masked.

djkirkham commented 7 years ago

After searching further I found this comment https://github.com/Unidata/netcdf4-python/issues/468#issuecomment-145873369, which suggests that variables can have a property to prevent the data being masked if the data matches the default fill value. I've discovered this line in libnetcdf which appears to indicate that in netCDF3 format this property is always enabled. Is this correct?

jswhit commented 7 years ago

It does look like nc_inq_var_fill always returns no_fill=1 for NETCDF3_CLASSIC and NETCD3_64BIT files in version 4.5.1. I'm pretty sure it wasn't always that way. Maybe @WardF or @DennisHeimbigner can shed some light. In your example,

>>> print v
<type 'netCDF4._netCDF4.Variable'>
int32 x(x)
unlimited dimensions:
current shape = (3,)
filling off

Using ds.set_fill_on() doesn't seem to help - the no_fill flag is still set. For NETCDF4 and NETCDF4_CLASSIC files, the no_fill flag is not set.

jswhit commented 7 years ago

Reading the docs more closely it seems that nc_inq_var_fill only works for NETCDF4 and NETCDF4_CLASSIC files. Although it is possible to turn fillling on and off using nc_set_fill for NETCDF3 files, I don't yet see how to determine whether filling is enabled. RIght now the python interface is incorrectly assuming that filling is always off for NETCDF3 files since nc_inq_var_fill always returns no_fill=1.

jswhit commented 7 years ago

See https://github.com/Unidata/netcdf-c/issues/499

jswhit commented 6 years ago

Pull request #726 implements a workaround - it assumes all NETCDF3 files have filling turned on when deciding whether to use _FillValue to return masked arrays. A true fix for this should be in netcdf-c 4.5.1 - right now the workaround is wrapped with a version < 4.5.1 if test.

jswhit commented 6 years ago

@djkirkham, if this is an acceptable short-term fix I'll go ahead and merge.

djkirkham commented 6 years ago

@jswhit Thanks for responding so quickly to this, but personally I'd prefer to wait for the true fix in netcdf-c rather than merging this. As a bit of background, I'm working on a library which allows users to read and analyse files in netCDF format (as well as others). I discovered this issue due to failures in some of our automated tests which were writing and then reading a netCDF3 classic file, and failing because of the lack of a mask. We've now patched those tests to set the fill value explicitly as a workaround. So this issue isn't really blocking us.

Although I suspect our user base is more likely to use netCDF3 files without _NoFill than those with, I'm wary about merging a change which essentially shifts the bug elsewhere. Additionally, currently it's possible to get around the issue by setting the fill value explicitly on write. If this change is merged there will be no equivalent workaround for writing variables with _NoFill if they are to be read with netCDF4-python.

jswhit commented 6 years ago

I see your point. I guess it's a matter of whether the bug we are swapping in is less onerous than the one that exists.