Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

load_stac goes over all bands even when one is selected. #695

Open EmileSonneveld opened 7 months ago

EmileSonneveld commented 7 months ago

This snippet has way different performance depending on what stac url is used:

load_collection = connection.load_stac(
    # 40min: `j-24022777057a4c6fbc0b821719aeafcb`
    url="/data/users/Public/victor.verhaert/ANINStac/ERA5-Land-monthly-averaged-data-v2/collection.json",

    # 5min: `j-2402273626504d288a3cedbe88dd2c0c`
    url="/data/users/Public/victor.verhaert/ANINStac/ERA5-TOTAL-PRECIPITATION/v0.1/collection.json",

    temporal_extent=temporal_extent,
    spatial_extent=spatial_extent,
    bands=['total_precipitation'],
)

When running with ERA5-Land-monthly-averaged-data-v2, the log contains entries like:

asset with band name total_precipitation not found in feature 2022-10-01_ssrd; inserting NODATA band instead
asset with band name total_precipitation not found in feature 2022-10-01_u10; inserting NODATA band instead
asset with band name total_precipitation not found in feature 2022-11-01_d2m; inserting NODATA band instead

Probably products are not filtered out of the catalog early enough.