Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

mask_scl_dilation: refactor to create the full mask first, for consistency #121

Closed jdries closed 1 year ago

jdries commented 1 year ago

mask_scl_dilation often has issues on tile edges, because clouds in neighbouring products are not visible. This was solved by the large overlap in Sentinel-2 products, but returns as an issue when loading with large tile sizes.

I propose to first build the entire mask, then apply it using the regular masking procedure. This has most chances of ensuring consistency.

JeroenVerstraelen commented 1 year ago

@jdries When we construct the entire mask this will also have to be done in tiles. How is this different from the current implementation? It seems like the same issues will arise.

JeroenVerstraelen commented 1 year ago

I did several tests where with a tilesize of 2048 and the new approach solves the issues at the tile borders:

Current Image New Image

In the top right of the first image you can see that the mask is clearly cut off by the tile borders. In the second image this border is no longer present and the dilated clouds of the neighbouring tiles have removed much of that mask.

jdries commented 1 year ago

@JeroenVerstraelen what's the setting to enable this? We should also make an assessment of performance impact. Can be done by looking at cpu use, but also spark-UI inspection.

jdries commented 1 year ago

I get an issue on the output edge still, this is my code for the mask:


scl = connection.load_collection(
            S2_collection,
            bands=["SCL"],
            spatial_extent=bbox,
            temporal_extent=[start, end],
            max_cloud_cover=95
        ).resample_spatial(projection=target_crs, resolution=10.0)
        mask = scl.process(
            "to_scl_dilation_mask",
            data=scl,
            kernel1_size=17, kernel2_size=77,
            mask1_values=[2, 4, 5, 6, 7],
            mask2_values=[3, 8, 9, 10, 11],
            erosion_kernel_size=3)
        bands = bands.mask(mask)

Image

JeroenVerstraelen commented 1 year ago

@jdries is it possible to get the full code with bbox and temporal extent? I used what I think is your process graph from the CDSE logs but the result appears to be correct.

This is the code I'm currently using:

bbox = {'west': 3740000.0, 'south': 3020000.0, 'east': 3740150.0, 'north': 3020150.0, 'crs': 'EPSG:3035'}
bands = ["B08", "SCL"]
start = "2021-03-01"
end = "2021-10-31"
S2_collection = "TERRASCOPE_S2_TOC_V2"
connection = openeo.connect("https://openeo-dev.vito.be/openeo/1.0").authenticate_oidc()
target_crs = "EPSG:3035"

s2 = connection.load_collection("TERRASCOPE_S2_TOC_V2", spatial_extent = bbox, temporal_extent = [start, end],
    bands = bands).resample_spatial(projection = target_crs, resolution = 10.0)

scl = connection.load_collection(
            S2_collection,
            bands=["SCL"],
            spatial_extent=bbox,
            temporal_extent=[start, end],
            max_cloud_cover=95
        ).resample_spatial(projection=target_crs, resolution=10.0)
mask = scl.process(
    "to_scl_dilation_mask",
    data=scl,
    kernel1_size=17, kernel2_size=77,
    mask1_values=[2, 4, 5, 6, 7],
    mask2_values=[3, 8, 9, 10, 11],
    erosion_kernel_size=3)
masked = s2.mask(mask)
masked.download("mask_scl_dilation.nc")
JeroenVerstraelen commented 1 year ago

When using a large area I get these results: mask_scl_dilation (current): Screenshot from 2023-06-22 15-51-41

to_scl_dilation_mask (new): Screenshot from 2023-06-22 15-51-53

Overlayed: Screenshot from 2023-06-22 16-00-05

So it looks like:

JeroenVerstraelen commented 1 year ago

For a far larger extent they appear to be almost identical:

current: Screenshot from 2023-06-22 16-27-15

new: Screenshot from 2023-06-22 16-26-52

overlay (current in red): Screenshot from 2023-06-22 16-28-42

There is only a slight difference at the edge where the current process (red) masked away more than the new one (black): Screenshot from 2023-06-22 16-29-47

jdries commented 1 year ago

@JeroenVerstraelen the problem is also visible in your code, but note that QGis does not show these nodata pixels at the edge. When I enlarge the bbox of the mask, the problem goes away. I think we need to do something like this: https://github.com/Open-EO/openeo-python-driver/blob/779b690183932f5f4acb84925167bb6306f8f44f/openeo_driver/dry_run.py#L617 To ensure that load_collection is aware that a pixel buffer is needed.

jdries commented 1 year ago

added a commit to get fix going. The buffer is normally applied here: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/7e8521ce39b6f82f7526cbdce4b49f48842cb410/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/layers/FileLayerProvider.scala#L621

jdries commented 1 year ago

works better now, I only still see a one pixel difference on the mask boundary when working on a very small (15x15 pixel) area.