Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

raster_to_vector output can not be downloaded: The requested URL was not found on the server #888

Open JeroenVerstraelen opened 10 hours ago

JeroenVerstraelen commented 10 hours ago

Minimal example:

connection = openeo.connect("openeo-dev.vito.be").authenticate_oidc()
bbox = [5.0, 51.2, 5.1, 51.3]
temp_ext = ["2023-01-01", "2023-01-20"]

s2_bands = connection.load_collection(
    "SENTINEL2_L2A", spatial_extent=dict(zip(["west", "south", "east", "north"], bbox)), temporal_extent=temp_ext, bands=["SCL"]
)

scl_band = s2_bands.band("SCL")
s2_cloudmask = ( (scl_band == 1) ) * 1.0

s2_cloudmask_vector = s2_cloudmask.raster_to_vector()
# s2_cloudmask_vector.download("s2_cloudmask_vector_vito_prod.geojson", format="geojson")
s2_cloudmask_vector.execute_batch("s2_cloudmask_vector_vito_prod.geojson", format="geojson")

Job directory contents:

[verstraj-local@epod156 j-2410011d99d94effb8522c45d3a9c689]$ ls -R
.:
collection.json  job_metadata.json  out  vectorcube.geojson.json

./out:
vectorcube.geojson

job_metadata.json refers to out/vectorcube.geojson.

"assets": {"vectorcube.geojson": {"href": "/data/projects/OpenEO/j-2410011d99d94effb8522c45d3a9c689/out/vectorcube.geojson", "title": "Vector cube", "type": "application/geo+json", "roles": ["data"]}}

But download_results() uses the following url:

https://openeo-dev.vito.be/openeo/1.2/jobs/j-2410011d99d94effb8522c45d3a9c689/results/assets/N2Y5YjFhNjY0ZjdiNGI3MDJkNmYyNWM5ZGQ4NTVmYTgyZjQ2MzY0ZDI2NmI4MTQ0MjZiOTM2ODhmOWI5YzFkNkBlZ2kuZXU=/e0cc103305ae62fb56dbd42d1ebeb83f/vectorcube.geojson

And so the user receives this error:

openeo.rest.OpenEoApiError: [404] NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. (ref: r-241001380e294d0d943be61ef4c10bca)
JeroenVerstraelen commented 9 hours ago

These are the contents of j-2410011d99d94effb8522c45d3a9c689/vectorcube.geojson.json

{"type": "Feature", "stac_version": "1.0.0", "id": "vectorcube.geojson", "geometry": null, "bbox": null, "properties": {"datetime": null}, "links": [], "assets": {"vectorcube.geojson": {"href": "./vectorcube.geojson", "roles": ["data"], "type": "application/geo+json"}}}

j-2410011d99d94effb8522c45d3a9c689/out/vectorcube.geojson does contain the correct FeatureCollection.

JeroenVerstraelen commented 9 hours ago
import openeo
connection = openeo.connect("openeo-dev.vito.be").authenticate_oidc()
job = connection.job("j-2410011d99d94effb8522c45d3a9c689")
print(job.get_results().get_assets())
> Authenticated using refresh token.
> [<ResultAsset 'vectorcube.geojson' (type application/geo+json) at 'https://openeo-dev.vito.be/openeo/1.2/jobs/j-2410011d99d94effb8522c45d3a9c689/results/assets/OTkxMzg5ZjE4ZGE0MmY0YzgwODdmNmMxZDVhOWVkMzE1NmY1YzA0MzQxYzQ4ODdhYjNjZjkyMzQxMjIwNzcyZkBlZ2kuZXU=/3ecb9e4d3f733e2f158ee5affdf29ba3/vectorcube.geojson?expires=1728378094'>]
JeroenVerstraelen commented 9 hours ago

job.get_results().get_metadata() https://gist.github.com/JeroenVerstraelen/0682af223b3ae36f57473100a05bc479

JeroenVerstraelen commented 3 hours ago

Note that geotiff job results are written to the job directory and not to {job_dir}/out/:

[verstraj-local@epod156 j-241001152a1c488bbb60553a17ba1f2c]$ ls -R
.:
collection.json                 openEO_2022-10-04Z.tif.json     openEO_2022-10-09Z.tif          openEO_2022-10-16Z.tif.aux.xml  openEO_2022-10-26Z.tif.json     openEO_2023-02-08Z.tif          openEO_2023-02-26Z.tif.aux.xml  openEO_2023-02-28Z.tif.json
job_metadata.json               openEO_2022-10-06Z.tif          openEO_2022-10-09Z.tif.aux.xml  openEO_2022-10-16Z.tif.json     openEO_2022-11-05Z.tif          openEO_2023-02-08Z.tif.aux.xml  openEO_2023-02-26Z.tif.json
openEO_2022-10-04Z.tif          openEO_2022-10-06Z.tif.aux.xml  openEO_2022-10-09Z.tif.json     openEO_2022-10-26Z.tif          openEO_2022-11-05Z.tif.aux.xml  openEO_2023-02-08Z.tif.json     openEO_2023-02-28Z.tif
openEO_2022-10-04Z.tif.aux.xml  openEO_2022-10-06Z.tif.json     openEO_2022-10-16Z.tif          openEO_2022-10-26Z.tif.aux.xml  openEO_2022-11-05Z.tif.json     openEO_2023-02-26Z.tif          openEO_2023-02-28Z.tif.aux.xml

So we might be wrong in writing the vectorcube results to an out dir.

GeopysparkDataCube.write_assets() uses: directory = str(pathlib.Path(filename).parent) while the DriverVectorCube writes uses filename as directory.

@soxofaan Is it okay for me to change the DriverVectorCube.write_assets() so that it has a similar implementation to GeopysparkDataCube.write_assets()?