Closed djkirkham closed 8 years ago
this looks like a useful and well coded change
does anyone have objections to me merging this? @bjlittle @pelson @pp-mo
thanks @marqh
I don't know enough to be sure about this, but with an eye to future addtions I'm wondering if a two-step solution would be more flexible : That is, provide a two-argument constructor that combines a data-array with a mask-array, treating them both as independent biggus calculations. Then we could define the required function as something like :
def masked_equal(array, value):
return masked_array(array, equal(array, value))
Where this masked_array
is a biggus equivalent of numpy.ma.array(data, mask=mask)
.
It is probably a new array class, something like 'DataAndMask(_Elementwise)', similar to the '_ufunc_wrapper' derivations.
The actual implementation would be basically much as already given in @djkirkham existing proposal. However, I'm suggesting the calculation is purely elementwise, so to do what we want here (compare with a constant) it uses broadcasting -- but I think that comes for free with the existing implementation.
The benefit is, it's then dead simple to provide similar new operators such as masked_less
.
Assuming we get around to needing such additions, this way could create a lot less clutter.
In direct usage, it can also handle much more complex cases,
like masked_array(a, mask=(b < c * 0.5))
.
Does anyone think that is a good idea ?
It seems to me that one problem with a general masked array class is that if the mask it based on the underlying array then when the data is loaded it must be loaded twice - once for the data and once for the mask. Is there a way around it other than implementing special cases like this one? Is it even a serious problem?
@djkirkham when the data is loaded it must be loaded twice
Ooh, good point !
I tend to forget that biggus is not like a language with lazy evaluation : they tend to avoid re-calculation. I think it was originally thought we would eventually address that in the 'engines' code, but it has never seemed much of a priority.
I do know we haven't bothered to avoid some duplicate evaluations in the StaGE configurations, as it was thought a minor problem : But I think that is mostly for orographies, where they are smaller (lower-dimensional) than the other data they are combined with.
I'm not sure what I think about the practicality of this now - it depends on intended usage.
i'm looking to cut a biggus release today.
On that basis, this either goes in as is, or it waits for further implementation.
My view is that this is useful, and that if a more flexible implementation arrives later, then this can become a call to that whilst maintaining the API call. On that Basis, my vote is for merge
@pp-mo @djkirkham please may you indicate whether you agree or disagree with me on this? thank you mark
@marqh agree or disagree
!agree! :+1:
@marq Sounds sensible, I agree.
I'm concerned about the results of this change:
In [3]: import biggus
In [4]: import numpy as np
In [5]: a = np.arange(10) % 2
In [6]: print(a)
[0 1 0 1 0 1 0 1 0 1]
In [7]: n_m = np.ma.masked_equal(a, 1)
In [8]: b_m = biggus.masked_equal(a, 1)
In [9]: print(n_m.mean(), biggus.mean(b_m, axis=0).ndarray())
(0.0, array(0.5))
Is there a reason you didn't implement ndarray
@djkirkham?
Good spot, but I don't think it's a bug introduced by this change. Rather, it's a problem with biggus.mean()
. The result is the same if line 8 is replaced with b_m = biggus.NumpyArrayAdapter(n_m)
This addition is designed to mimic the NumPy function
numpy.ma.masked_equal()