Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

Support multiple CRS in one STAC collection/catalog for `load_stac` #827

Open VincentVerelst opened 1 month ago

VincentVerelst commented 1 month ago

Currently, when trying to load_stac a collection/catalog consisting of items with different CRS a java.lang.IllegalArgumentException: All items in a collection must have the same CRS is thrown.

Can we support multiple CRS for load_stac, similar to when a load_collection is done across multiple UTM zones?

bossie commented 1 month ago

Context: j-24071748981946cab735e012f7fa95df on Terrascope.

OpenEO batch job failed: java.lang.IllegalArgumentException: All items in a collection must have the same CRS

Traceback (most recent call last):
  File "batch_job.py", line 490, in <module>
    main(sys.argv)
  File "batch_job.py", line 181, in main
    run_driver()
  File "batch_job.py", line 152, in run_driver
    run_job(
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "batch_job.py", line 251, in run_job
    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1589, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1589, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1621, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2255, in load_stac
    return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 745, in load_stac
    return load_stac.load_stac(url, load_params, env, layer_properties={}, batch_jobs=self.batch_jobs)
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/load_stac.py", line 468, in load_stac
    pyramid = pyramid_factory.datacube_seq(projected_polygons, from_date.isoformat(), to_date.isoformat(),
  File "/opt/spark3_4_0/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/opt/spark3_4_0/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.layers.NetCDFCollection.datacube_seq.
: java.lang.IllegalArgumentException: All items in a collection must have the same CRS
    at org.openeo.geotrellis.layers.NetCDFCollection$.loadCollection(NetCDFCollection.scala:43)
    at org.openeo.geotrellis.layers.NetCDFCollection$.datacube_seq(NetCDFCollection.scala:27)
    at org.openeo.geotrellis.layers.NetCDFCollection.datacube_seq(NetCDFCollection.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:829)
jdries commented 1 month ago

@VincentVerelst would it also be an option to perform the loading of these items per UTM zone? Perhaps using merge_cubes to combine them later on?

bossie commented 1 month ago

As discussed, prioritizing https://github.com/eu-cdse/openeo-cdse-infra/issues/196 in favor of this.

VincentVerelst commented 1 month ago

We are currently creating some utility functions in GFMap to automatically split STAC collections per UTM zone. This will allow projects to move forward and make this issue more of a QoL upgrade.