add masked statistical reductions

leifdenby commented 2 years ago

This ready for you to have a look @martinjanssens if you have time :)

martinjanssens commented 2 years ago

Fantastic! This looks great. I have two comments:

I think we'd like to offer the possibility of computing statistical moments of masked scalars (as this PR currently does), as well as of the entire field. E.g. perhaps I'd just like to calculate the variance of vertically integrated water vapour, for which no mask is needed. Hence, I wonder if we might want to make the specification of a mask to these functions optional?
Especially if we do 1., we may have to add a bit more documentation to the use of each function for the user's convenience.

If you agree with both points, I don't mind suggesting something concrete (though you're more than welcome to give it a crack of course!)

leifdenby commented 2 years ago

I think we'd like to offer the possibility of computing statistical moments of masked scalars (as this PR currently does), as well as of the entire field. E.g. perhaps I'd just like to calculate the variance of vertically integrated water vapour, for which no mask is needed. Hence, I wonder if we might want to make the specification of a mask to these functions optional?

Especially if we do 1., we may have to add a bit more documentation to the use of each function for the user's convenience.

If you agree with both points, I don't mind suggesting something concrete (though you're more than welcome to give it a crack of course!)

Good idea! So we make the mask optional and use it if it is available. You're very welcome to have a go!

This could maybe also be a flexible way to optionally provide object_labels at some point. I'm thinking we'd add a method kwarg that could names like per_object or entire_mask. Not a fully formed idea yet... But that's an idea for later :)

leifdenby commented 2 years ago

By the way, working on this, I began to wonder whether we're fully consistent in how we're treating masks. Here, it is very explicitly asked to be an array of bools, while I think the other functions that operate on masks are happy to accept floats. I've dealt with it here by adding an explicit conversion to bool for the provided masks in the functions, but I don't know if we should be stricter.

Yes, it's a bit of an oddity with numpy, or rather with python, that bool(1.0) == True and float(True) == 1.0. That's because True/False internally in python is represented by 1 and 0, and those can both be typecast from floats 1.0 and 0.0. To be consistent with how numpy works I think we should just leave it, numpy will complain if it can't use the provided values to mask with :) (and the error from numpy will be more instructive than anything I'd come up with)

Alright, I've made some suggestions in this branch:

* Made computing masked statistics optional

* Added more details to the docstrings for how to compute the masked/full field statistics

Thanks! I'll merge that into my branch and get this merged into master

cloudsci / cloudmetrics

add masked statistical reductions #46