Open martinjanssens opened 11 months ago
Hi @martinjanssens ,
Thank you for bringing up this question. Providing a universal treatment for NaN values can indeed be challenging. I like your suggestions, and I agree with all of them.
Additionally, we might consider incorporating two input parameters when calling the metrics: one parameter to indicate the allowance of NaN values in the images and another parameter to set a maximum threshold for the fraction of NaNs
We should support input fields with missing data. I'd argue that by default, if a nan is encountered, cloudmetrics should exclude that pixel from the metric calculations as much as possible, that is, it's neither cloud, nor clear sky, but still try to return a metric when this is a reasonable thing to do (of course, the question is what is reasonable).
We need to do this, because currently most (but not all) metrics break upon encountering a nan in an input field. That would at least have been consistent, but right now, it's really easy to zero-fill nans and just compute the metrics anyway. This will work for some metrics, but some will return crap, and some should not work. Choosing which metrics should handle nans and how is going to be a bit subjective of course, but we should at a minimum be explicit in what we support.
I'd be in favour of the following, but am happy to debate:
cloud_fraction
cannot be fixed this way, and should by default exclude nans from its calculationfractal_dimension
is open for interpretation - it could break on coarse-graining if a block contains a nan, or simply ignore the nan and be slightly imprecise - I vote for the latteropen_sky
- I don't quite know what to do with this. Along searches for the next cloudy pixel along a direction, ignoring a nan essentially means treating it as open sky area, while stopping the search treats it as cloud - this decision is binary. Encountering a nan could lead to acontinue
statement - i.e. exclusion of all pixels that lie on a row/column with a nan in it, but this may just exclude the whole image, if there e.g. is a column of nans at a domain edge. Short term, I think we should makeopen_sky
break if there are nans, to be consistent with the rest of the metrics, and to prevent it from returning crap.stats
should by default ignore the nan - that is, we should calculate statistical moments of scalar fields with the nans excluded from the calculation.I'm happy to have a go at this, if we largely agree :)