mean of an arraystack of masked arrays

SciTools / biggus

:no_entry: [DEPRECATED] Virtual large arrays and lazy evaluation.

http://biggus.readthedocs.io/

GNU Lesser General Public License v3.0

54 stars 27 forks source link

mean of an arraystack of masked arrays #90

Closed guziy closed 10 years ago

guziy commented 10 years ago

Hi:

I was trying to get a time mean of a masked array using biggus as follows

   for sname, folders in season_name_to_folder_list.iteritems():
        arr_stack = biggus.ArrayStack(
            np.array([biggus.NumpyArrayAdapter(DataForDay(var_name=varname, day=fname)) for fname in folders])
        )
        print arr_stack.shape
        result[sname] = np.flipud(biggus.mean(arr_stack, axis=0).masked_array())

And it seems that the result of this operation gives masked-value + not-masked value -> not-masked-value, since when I use longer time intervals the masked region decreases?

__getitem__ of DataForDay returns a masked array.... But maybe I spoil everything by wrapping the list of numpyadapters with np.array()... Is there a way to do it properly?

Here is the definition of the DataForDay class: https://github.com/guziy/ShortPythonScripts/blob/master/modis_download/mcd43c3_seasonal_mean.py#L29

guziy commented 10 years ago

Here is a weird example, I try to get a mean along 0 axis of a stack of masked arrays and it works:

In [22]: x = np.random.randn(10, 10)
In [23]: y = np.ma.masked_where(x < 0, x)
In [26]: arrst = biggus.ArrayStack(np.array([biggus.NumpyArrayAdapter(y) for i in range(10)]))
zmean = biggus.mean(arrst, axis=0).masked_array()
In [32]: zmean.mask == y.mask
Out[32]: 
array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True]], dtype=bool)

guziy commented 10 years ago

I think the mask of the resulting array is not calculated correctly... Please see the gist:

https://gist.github.com/guziy/fdbe65efa40728f3b7a8#file-test_array_stack_mean-py

Thanks

rhattersley commented 10 years ago

Hi @guziy - thanks for the gist. I'll investigate...

rhattersley commented 10 years ago

Everything seems OK.

Biggus handles masked values just like np.ma does. So masked values are ignored when calculating a mean:

>>> a = np.ma.masked_array([1,2,3,4], mask=[0, 1, 0, 1])
>>> a
masked_array(data = [1 -- 3 --],
             mask = [False  True False  True],
       fill_value = 999999)
>>> np.ma.mean(a)
2.0

guziy commented 10 years ago

Thanks, got it))

But then is there a way to not ignore missing values?

Thank you

rhattersley commented 10 years ago

Could you modify __getitem__ to return an ndarray with NaN values instead? Then you could use the non-masked code instead: biggus.mean(arr_stack, axis=0).ndarray()

>>> b = np.arange(4, dtype='f')
>>> b[1] = np.nan
>>> np.mean(b)
nan

guziy commented 10 years ago

Ah yes, thanks, I'll go this way.

Cheers