calculate_water hypervariables need research

The original magic numbers used by the variable here don't work well at all. The reason being that they were meant to capture all cloud, even the thinnest, into a boolean mask. But we are interested in a gradation.

The numbers provided by the algorithm {(0.01, 0.11), (0.1, 0.05)} therefore tell us where the low end of our logistic function needs to be. Ideally we have a separate, matching study on numbers to give the upper end. These should be more straightforward to attain, since all we need is to look at a variety of clouds (in different climates and seasons) that are thick enough to completely obscure the land or water beneath.

The current numbers work well for the Eastern Indian seaboard scene I've been using, and have been picked simply through trial and error. The median value of 0.2 does well to distinguish land from water, and a width_factor of 9 puts it at roughly 12% expected cloud influence when ndvi=0.01 (before accounting for near-infrared), as can be seen here.

This works reasonably well. However, the logistic function gets near 1.0 too early, I think. Indicating that some clouds almost-100% obscure the surface, when I can see that they do not in the original picture. Therefore we need that upper bound, so that we can adjust the median value and width_factor to frame the curve correctly.

It does however seem to pick up a lot of false positives on land. But that seems to be mostly due to the second part of the formula.

Note that we can probably throw out the second half of the algorithm's formula. The second half is designed to pick up on thin clouds over water. However, that is already accounted for by the new test not being a boolean mask. Gradations of thin cloud over water are captured by shades of gray.

akalenda / PyFmask

calculate_water hypervariables need research #8