Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

max_cloud_cover parameter affects the size of downloaded image #633

Open maurosyl opened 9 months ago

maurosyl commented 9 months ago

I am trying to download L1C images using the load_collection function with a temporal filter, a spatial filter (a bounding box) and a max cloud cover parameter. When i set max_cloud_cover=100 and download one or more assets returned by load_collection, i see the expected behaviour, that is, the downloaded image cover the bounding box that i have passed to the load_collection function. However if i set a max_cloud_cover < 100 (Es. 20), the downloaded images appear cropped in a smaller area where the cloud cover requirement is respected.

Is this behaviour expected or is it a bug? I find it quite confusing that setting a max_cloud_cover affects the bounding box of the downloaded image, acting as a spatial filter somewhat arbitrarily.

soxofaan commented 9 months ago

What openEO backend URL are you using? And can you share some more concrete example snippets?

(note that this is probably not an issue with the python client itself, but a back-end-specific question, hence my questions)

maurosyl commented 9 months ago

The url is: openeo.dataspace.copernicus.eu

Here is an example of what i am doing:

connection = openeo.connect(url="openeo.dataspace.copernicus.eu")
connection.authenticate_oidc()
s2_cube = connection.load_collection(
    "SENTINEL2_L1C",
    temporal_extent=('2022-05-01', '2022-05-30'),
    spatial_extent={
        "west": 387993.2067,
        "south": 4984598.774900001,
        "east": 403452.1182000004,
        "north": 4999394.857100001,
        "crs": EPSG:32632,
    },
    bands=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B10', 'B11', 'B12'],
    max_cloud_cover=10,
)
# Get job results 
batch_job = s2_cube.create_job(out_format='GTiff')
batch_job.start_and_wait()
results = batch_job.get_results()
# Download the first asset
results.get_assets()[0].download(target=output_path)

This screenshot illustrates the issue. The image in the background was downloaded with a max_cloud_cover=100, the "darker" image in the foreground was downloaded with the same exact query apart from the max_cloud_cover=10. You should be able to reproduce this.

max_cloud_cover_bug

soxofaan commented 9 months ago

Is this behaviour expected or is it a bug? I find it quite confusing that setting a max_cloud_cover affects the bounding box of the downloaded image, acting as a spatial filter somewhat arbitrarily.

To me, it doesn't look like a bug that a cloud cover constraint influences the effective spatial and/or temporal extent of your result. Note that the cloud cover constraint is used to filter out low level (sub)tiles of the data collection. The cloud cover constraint is not a filter applied to the whole spatial extent provided in your load_collection. I hope this clarifies the behavior you see.