CosmiQ / solaris

CosmiQ Works Geospatial Machine Learning Analysis Toolkit
https://solaris.readthedocs.io
Apache License 2.0
414 stars 112 forks source link

[FEATURE]: Handling nodata values generally, should we toss, fill, set to 0? #328

Closed rbavery closed 4 years ago

rbavery commented 4 years ago

Is your feature request related to a problem? Please describe. Some image sources (like Landsat) have considerable portions of the image filled with nodata values beyond the boundaries of the satellite overpass, or have nodata gaps once a cloud mask is applied.

Describe the solution you'd like In these cases, I think that these nodata values in the image could be filled with a value (such as the mean of the image used for tiling or the entire training dataset) and the corresponding label pixels could be set to background. In these cases where we don't have the original image values at nodata pixel locations, we might still want to keep the rest of the image and label pixels in our training pool, but not keep labels for a filled value or a nodata value like -9999.

If we did handle nodata like this, I think the data filling and label masking step would come after both vector tiling and then running a label mask function on the vector tiles. The mask for each image tile would be updated to mask out filled areas and areas restricted by aoi, and this mask would be used to remove label pixels in the label tiles so that fill values are not labeled.

This would be a bit of a chore and would probably also require a function to go from label masks to coco format (rather than vector tiles to coco format as is currently supported), but it would preserve more training data (which there is always a lack of in geospatial ML).

The alternative to the filling approach would be to toss out images that have a certain threshold of nodata in each tile. I have a PR that currently implements this but am curious if there is interest in supporting the fill-based method above, similar to what we want to do with restrict_by_aoi but more generally for all nodata values in an image tile.

rbavery commented 4 years ago

solved by https://github.com/CosmiQ/solaris/pull/331