Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

forum473: complex masking issue #227

Closed soxofaan closed 1 year ago

soxofaan commented 2 years ago

originally reported here https://discuss.eodc.eu/t/masking-does-not-work/473

a relatively complex openeo algorithm fails with

java.lang.IllegalArgumentException: Invalid band index 5, only 4 bands available.

But this only happens in batch job mode, it does not happen in sync mode

soxofaan commented 2 years ago

I managed to trim down the process graph to this minimal reproduction use case:

import openeo
connection = openeo.connect("openeo.vito.be").authenticate_oidc()

temporal_extent = ['2021-06-01', "2021-06-12"]
spatial_extent = {'west': -74.05, 'east': -74, 'south': 4.7, 'north': 4.72, 'crs': 'epsg:4326'} 

s2_cube = connection.load_collection(
    'SENTINEL2_L2A_SENTINELHUB',
    spatial_extent = spatial_extent,
    temporal_extent = temporal_extent,
    bands = ['B02', 'B03', 'B04', "B06", 'B08', 'CLP', 'SCL' , 'sunAzimuthAngles', 'sunZenithAngles'] 
)

s1_cube = connection.load_collection(
    'SENTINEL1_GRD', 
     spatial_extent = spatial_extent, 
     temporal_extent = temporal_extent, 
     bands = ['VH','VV'],
)

s2_cube = s2_cube.mask(
    s2_cube.band("SCL") == 3
)

s1_cube = s1_cube.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")

exclusion_mask = (
    (s1_cube.resample_cube_spatial(s2_cube) > 0.5) 
    & (s2_cube < 0.33)
)

s1_s2_cube = (
    s1_cube.resample_cube_spatial(s2_cube)
    .merge_cubes(s2_cube)
)

masked = (s1_s2_cube).mask(exclusion_mask.resample_cube_spatial(s1_s2_cube))

res = masked.save_result(format='netCDF')

Executing this a batch job fails like this:

  File "/opt/spark3_2_0/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1064.datacube_seq.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 40.0 failed 4 times, most recent failure: Lost task 2.3 in stage 40.0 (TID 93) (epod047.vgt.vito.be executor 5): java.lang.IllegalArgumentException: Invalid band index 6, only 4 bands available.
    at org.openeo.geotrellis.OpenEOProcessScriptBuilder.$anonfun$arrayElementFunction$2(OpenEOProcessScriptBuilder.scala:1137)
    at org.openeo.geotrellis.OpenEOProcessScriptBuilder$.org$openeo$geotrellis$OpenEOProcessScriptBuilder$$evaluateToTiles(OpenEOProcessScriptBuilder.scala:200)
    at org.openeo.geotrellis.OpenEOProcessScriptBuilder.$anonfun$xyFunction$2(OpenEOProcessScriptBuilder.scala:506)
    at org.openeo.geotrellis.OpenEOProcessScriptBuilder$.$anonfun$wrapProcessWithDefaultContext$1(OpenEOProcessScriptBuilder.scala:46)
    at org.openeo.geotrellis.OpenEOProcesses.$anonfun$mapBandsGeneric$2(OpenEOProcesses.scala:398)
    at org.apache.spark.rdd.PairRDDFunctions.$anonfun$mapValues$3(PairRDDFunctions.scala:751)
    at scala.collection.parallel.IterableSplitter$Mapped.next(RemainsIterator.scala:475)

Executing it synchronously works

soxofaan commented 2 years ago

Working from the use case above, these changes make the batch job finish without error:

soxofaan commented 2 years ago

A likely source for the bug is the "data_mask" optimization feature (a mask raster is pushed to load_collection time to optimize data loading): due multiple mask/merge_cubes operations there is something going wrong there.

I added more logging and a feature flag for the "data_mask" optimization feature, e.g.:

cube.create_job(
    ...
    job_options={
         "data_mask_optimization": False,
    }

When disabling the "data_mask" optimization feature, the next error is

  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1494, in apply_process
    return process_function(args=args, env=env)
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1094, in mask
    return cube.mask(mask=mask, replacement=replacement)
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 932, in mask
    mask_pyramid_levels = {
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 933, in <dictcomp>
    k: l.tile_to_layout(layout=self.pyramid.levels[k])
  File "/opt/venv/lib64/python3.8/site-packages/geopyspark/geotrellis/layer.py", line 1819, in tile_to_layout
    raise ValueError("The layout needs to have the same crs as the TiledRasterLayer")
ValueError: The layout needs to have the same crs as the TiledRasterLayer
soxofaan commented 2 years ago

with the added logging of 1b2673f I now see this mismatch:

Failed to retile mask: {0: '+proj=longlat +datum=WGS84 +no_defs '} != {0: '+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs '}
jdries commented 2 years ago

The use of ard_normalized_radar_backscatter result in a datacube in epsg:4326, because that's what sentinelhub produces. Also on our side, the methods to load the resulting geotiffs are still older versions: PyramidFactory.from_disk and PyramidFactory.from_s3 now have newer version that result in datacubes with better layouts, so we probably want to update those methods.

jdries commented 2 years ago

I just had a successfull run, only by using 'regular' backscatter instead of card4l: s1_cube = s1_cube.sar_backscatter(coefficient="gamma0-terrain", mask=True, elevation_model="COPERNICUS_30") This also has the performance and accuracy advantage of not going via lat lon backscatter.

jdries commented 1 year ago

worldwater case: improved further and works