Supporting nans in input fields

We should support input fields with missing data. I'd argue that by default, if a nan is encountered, cloudmetrics should exclude that pixel from the metric calculations as much as possible, that is, it's neither cloud, nor clear sky, but still try to return a metric when this is a reasonable thing to do (of course, the question is what is reasonable).

We need to do this, because currently most (but not all) metrics break upon encountering a nan in an input field. That would at least have been consistent, but right now, it's really easy to zero-fill nans and just compute the metrics anyway. This will work for some metrics, but some will return crap, and some should not work. Choosing which metrics should handle nans and how is going to be a bit subjective of course, but we should at a minimum be explicit in what we support.

I'd be in favour of the following, but am happy to debate:

Object metrics can be computed by prefilling the nans with zeros, and they should roughly work as expected (maybe with the exception of cloud objects with nans inside them). We can add this functionality to our object labeling function
cloud_fraction cannot be fixed this way, and should by default exclude nans from its calculation
fractal_dimension is open for interpretation - it could break on coarse-graining if a block contains a nan, or simply ignore the nan and be slightly imprecise - I vote for the latter
open_sky - I don't quite know what to do with this. Along searches for the next cloudy pixel along a direction, ignoring a nan essentially means treating it as open sky area, while stopping the search treats it as cloud - this decision is binary. Encountering a nan could lead to a continue statement - i.e. exclusion of all pixels that lie on a row/column with a nan in it, but this may just exclude the whole image, if there e.g. is a column of nans at a domain edge. Short term, I think we should make open_sky break if there are nans, to be consistent with the rest of the metrics, and to prevent it from returning crap.
Spectral and wavelet metrics could have gap filling, but I'm in favour of just making them break, as gap filling will affect the spectra and it's hard to say a priori by how much. A user can always make these operational again by filling the nans themselves.
stats should by default ignore the nan - that is, we should calculate statistical moments of scalar fields with the nans excluded from the calculation.

I'm happy to have a go at this, if we largely agree :)

cloudsci / cloudmetrics

Supporting nans in input fields #83