Different numpy array type returned depending on the values in the ncdf variable.

GoogleCodeExporter commented 8 years ago


When I read a NCDF variable that contains an element that has the same value as 
the _FillValue attribute, the result will be a numpy maskedarray and the 
element will be masked. However, when the array doesn't contain an element with 
the same value as the _FillValue attribute, the resulting array will be a 
(regular) numpy ndarray. So, the return type is different depending of the 
contents of the variable.

This is very inconvenient because masked arrays and regular ndarrays don't have 
the same interface, some ndarray functions don't exist (or work differently) 
for masked arrays.

I've attached a program that demonstrates the issue. It writes an array of 10 
elements to a ncdf file and puts a fill value in the 6th element. It then reads 
the first half of the array and calls the nanmax function; this works fine. The 
second half, however, returns a masked array and for which nanmax function 
doesn't work properly. 

Output:
    first half type     : <type 'numpy.ndarray'>
    first half maximum  : 4.0
    second half type    : <class 'numpy.ma.core.MaskedArray'>
    second half maximum : <numpy.ma.core.MaskedIterator object at 0x7f8d74b61d50>

Is there a way to remedy this?

Best regards, Pepijn.

Original issue reported on code.google.com by titus...@gmail.com on 19 Apr 2013 at 5:03

Attachments:

show_issue.py

GoogleCodeExporter commented 8 years ago

If you want, you can turn this off using the set_auto_maskandscale method of 
the netcdf variable:

http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Variable-class.html#
set_auto_maskandscale

It is unfortunate that masked arrays are not complete drop-in replacements for 
regular numpy arrays.  However, I think the benefits of returning a masked 
array by default outweight the potential pitfalls.  In fact, the default 
setting for set_auto_maskandscale was initially false (it hard to be turned on 
by the user), but so many users complained that I changed the default to True.

Original comment by whitaker.jeffrey@gmail.com on 22 Apr 2013 at 10:49

GoogleCodeExporter commented 8 years ago

The numpy developers are working on a replacement for masked arrays, which will 
hopefully address the issues you see:

http://www.compsci.wm.edu/SciClone/documentation/software/math/NumPy/html1.7/ref
erence/arrays.maskna.html

Perhaps once this is fully implemented netcdf4-python can just always return 
numpy.NA objects.

Original comment by whitaker.jeffrey@gmail.com on 22 Apr 2013 at 10:56

GoogleCodeExporter commented 8 years ago

Thanks I missed the set_auto_maskandscale option somehow. I can see why the 
users wanted this value to be True by default; masked arrays are a good way to 
cope with the fill values. 

IMHO, the best implementation would be that, when the set_auto_maskandscale 
flag is True, the resulting array is always a masked array if the _FillValue 
attribute is present, regardless if the variable contains actual fill values or 
not. 

However, if you would just implement this it would probably break a lot of 
peoples existing code. What do you think?

Original comment by titus...@gmail.com on 25 Apr 2013 at 1:08

GoogleCodeExporter commented 8 years ago

Original comment by whitaker.jeffrey@gmail.com on 26 Feb 2014 at 2:04

Changed state: Fixed

cas3ymau3 / netcdf4-python

Different numpy array type returned depending on the values in the ncdf variable. #173