DHI-GRAS / worldwater-toolbox

WorldWater Toolbox
GNU General Public License v3.0
4 stars 2 forks source link

[Performance] apply sar data mask using band math #3

Closed jdries closed 1 year ago

jdries commented 1 year ago

In the current version, the sar data mask is applied with the 'mask' method. This alternative is more efficient in this case:

s1_cube = s1_cube.sar_backscatter(coefficient="gamma0-terrain", mask=True, elevation_model="COPERNICUS_30")
s1_cube = s1_cube.rename_labels("bands", ["VH", "VV", "mask", "incidence_angle"])

from openeo.processes import if_,array_create
def apply_mask(bands):    
    return if_(bands.array_element(2)!=2,bands)
s1_cube = s1_cube.apply_dimension(apply_mask,dimension="bands")

Explanation

The use of the mask function basically splits the first Sentinel-1 cube first into two separate cubes, and then joins them again to perform the masking. This results in a more complex process graph compared to using apply_dimension, which simply acts on the same data cube.

The complexity of the process graph in this case also seems to result in more data loading from sentinelhub. This could potentially be improved on the backend side, but the proposed solution does not depend on being able to optimize the more complex graph.

Results

This benchmark was run on a small area, but already shows a big difference:

Before CPU usage: 5,328 cpu-seconds Wall time 238 seconds Memory usage 13,057,281 mb-seconds Sentinelhub 240 sentinelhub_processing_unit

After CPU usage 2,387 cpu-seconds Wall time 206 seconds Memory usage 5,647,079 mb-seconds Sentinelhub 80 sentinelhub_processing_unit

sulova commented 1 year ago

Super! We implemented it and it improves the proccesing time.