Closed guziy closed 10 years ago
Here is a weird example, I try to get a mean along 0 axis of a stack of masked arrays and it works:
In [22]: x = np.random.randn(10, 10)
In [23]: y = np.ma.masked_where(x < 0, x)
In [26]: arrst = biggus.ArrayStack(np.array([biggus.NumpyArrayAdapter(y) for i in range(10)]))
zmean = biggus.mean(arrst, axis=0).masked_array()
In [32]: zmean.mask == y.mask
Out[32]:
array([[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True]], dtype=bool)
I think the mask of the resulting array is not calculated correctly... Please see the gist:
https://gist.github.com/guziy/fdbe65efa40728f3b7a8#file-test_array_stack_mean-py
Thanks
Hi @guziy - thanks for the gist. I'll investigate...
Everything seems OK.
Biggus handles masked values just like np.ma
does. So masked values are ignored when calculating a mean:
>>> a = np.ma.masked_array([1,2,3,4], mask=[0, 1, 0, 1])
>>> a
masked_array(data = [1 -- 3 --],
mask = [False True False True],
fill_value = 999999)
>>> np.ma.mean(a)
2.0
Thanks, got it))
But then is there a way to not ignore missing values?
Thank you
Could you modify __getitem__
to return an ndarray with NaN values instead? Then you could use the non-masked code instead: biggus.mean(arr_stack, axis=0).ndarray()
>>> b = np.arange(4, dtype='f')
>>> b[1] = np.nan
>>> np.mean(b)
nan
Ah yes, thanks, I'll go this way.
Cheers
Hi:
I was trying to get a time mean of a masked array using biggus as follows
And it seems that the result of this operation gives masked-value + not-masked value -> not-masked-value, since when I use longer time intervals the masked region decreases?
__getitem__
of DataForDay returns a masked array.... But maybe I spoil everything by wrapping the list of numpyadapters with np.array()... Is there a way to do it properly?Here is the definition of the DataForDay class: https://github.com/guziy/ShortPythonScripts/blob/master/modis_download/mcd43c3_seasonal_mean.py#L29