Closed soxofaan closed 1 year ago
I managed to trim down the process graph to this minimal reproduction use case:
import openeo
connection = openeo.connect("openeo.vito.be").authenticate_oidc()
temporal_extent = ['2021-06-01', "2021-06-12"]
spatial_extent = {'west': -74.05, 'east': -74, 'south': 4.7, 'north': 4.72, 'crs': 'epsg:4326'}
s2_cube = connection.load_collection(
'SENTINEL2_L2A_SENTINELHUB',
spatial_extent = spatial_extent,
temporal_extent = temporal_extent,
bands = ['B02', 'B03', 'B04', "B06", 'B08', 'CLP', 'SCL' , 'sunAzimuthAngles', 'sunZenithAngles']
)
s1_cube = connection.load_collection(
'SENTINEL1_GRD',
spatial_extent = spatial_extent,
temporal_extent = temporal_extent,
bands = ['VH','VV'],
)
s2_cube = s2_cube.mask(
s2_cube.band("SCL") == 3
)
s1_cube = s1_cube.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")
exclusion_mask = (
(s1_cube.resample_cube_spatial(s2_cube) > 0.5)
& (s2_cube < 0.33)
)
s1_s2_cube = (
s1_cube.resample_cube_spatial(s2_cube)
.merge_cubes(s2_cube)
)
masked = (s1_s2_cube).mask(exclusion_mask.resample_cube_spatial(s1_s2_cube))
res = masked.save_result(format='netCDF')
Executing this a batch job fails like this:
File "/opt/spark3_2_0/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1064.datacube_seq.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 40.0 failed 4 times, most recent failure: Lost task 2.3 in stage 40.0 (TID 93) (epod047.vgt.vito.be executor 5): java.lang.IllegalArgumentException: Invalid band index 6, only 4 bands available.
at org.openeo.geotrellis.OpenEOProcessScriptBuilder.$anonfun$arrayElementFunction$2(OpenEOProcessScriptBuilder.scala:1137)
at org.openeo.geotrellis.OpenEOProcessScriptBuilder$.org$openeo$geotrellis$OpenEOProcessScriptBuilder$$evaluateToTiles(OpenEOProcessScriptBuilder.scala:200)
at org.openeo.geotrellis.OpenEOProcessScriptBuilder.$anonfun$xyFunction$2(OpenEOProcessScriptBuilder.scala:506)
at org.openeo.geotrellis.OpenEOProcessScriptBuilder$.$anonfun$wrapProcessWithDefaultContext$1(OpenEOProcessScriptBuilder.scala:46)
at org.openeo.geotrellis.OpenEOProcesses.$anonfun$mapBandsGeneric$2(OpenEOProcesses.scala:398)
at org.apache.spark.rdd.PairRDDFunctions.$anonfun$mapValues$3(PairRDDFunctions.scala:751)
at scala.collection.parallel.IterableSplitter$Mapped.next(RemainsIterator.scala:475)
Executing it synchronously works
Working from the use case above, these changes make the batch job finish without error:
s1_cube = s1_cube.ard_normalized_radar_backscatter(elevation_model="COPERNICUS_30")
A likely source for the bug is the "data_mask" optimization feature (a mask raster is pushed to load_collection
time to optimize data loading): due multiple mask/merge_cubes operations there is something going wrong there.
I added more logging and a feature flag for the "data_mask" optimization feature, e.g.:
cube.create_job(
...
job_options={
"data_mask_optimization": False,
}
When disabling the "data_mask" optimization feature, the next error is
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1494, in apply_process
return process_function(args=args, env=env)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1094, in mask
return cube.mask(mask=mask, replacement=replacement)
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 932, in mask
mask_pyramid_levels = {
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 933, in <dictcomp>
k: l.tile_to_layout(layout=self.pyramid.levels[k])
File "/opt/venv/lib64/python3.8/site-packages/geopyspark/geotrellis/layer.py", line 1819, in tile_to_layout
raise ValueError("The layout needs to have the same crs as the TiledRasterLayer")
ValueError: The layout needs to have the same crs as the TiledRasterLayer
with the added logging of 1b2673f I now see this mismatch:
Failed to retile mask: {0: '+proj=longlat +datum=WGS84 +no_defs '} != {0: '+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs '}
The use of ard_normalized_radar_backscatter result in a datacube in epsg:4326, because that's what sentinelhub produces. Also on our side, the methods to load the resulting geotiffs are still older versions: PyramidFactory.from_disk and PyramidFactory.from_s3 now have newer version that result in datacubes with better layouts, so we probably want to update those methods.
I just had a successfull run, only by using 'regular' backscatter instead of card4l: s1_cube = s1_cube.sar_backscatter(coefficient="gamma0-terrain", mask=True, elevation_model="COPERNICUS_30") This also has the performance and accuracy advantage of not going via lat lon backscatter.
worldwater case: improved further and works
originally reported here https://discuss.eodc.eu/t/masking-does-not-work/473
a relatively complex openeo algorithm fails with
But this only happens in batch job mode, it does not happen in sync mode