Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
299 stars 38 forks source link

Check for undersampling of certain geographic regions due to cloud cover filters #93

Closed weiji14 closed 3 months ago

weiji14 commented 8 months ago

In #28 and #80, we've developed a geographic sampling scheme based on WorldCover that is supposed to sample a diverse set of regions based on landcover types.

However, in #60/#68, we've applied a NoData filter that removes some of those sampled regions. We'll need to double check if those filters are undersampling certain geographic regions that have high cloud cover, or areas where high surface reflectance can lead to false positive cloud cover values (e.g. over polar regions).

For example, we should have 40+ MGRS tiles over Greenland with the sampling procedure from #81:

image

But I looked at the s3 bucket, searching over 20X-26X, 21W-26W, 22V-24V, and couldn't find a single tile, even in the coastal areas that are not pure white!

So we'll need to check if there are certain data gaps, and potentially increase the cloud cover threshold or something.

yellowcap commented 3 months ago

We addressed this by using the least cloudy image for each season in an area. This increases the chances of getting enough imagery in each region. But the bias might still be there, since in very cloudy areas even that approach might lower the number of samples.

We can revisit in a later stage.