Open automataIA opened 4 months ago
Hi, the collection you are using seems to be badly configured, and it's for sure not a STAC collection. We have an example of correct configurations here: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/creo_layercatalog.json
What you are looking for however, is to load data via the load_stac process rather than load_collection. This will work better if you work a proper STAC catalog that has public data.
For STAC catalogs that do require authentication, it gets more difficult, as there are different authentication mechanisms out there. We also don't have active developments there ourselves, as our current deployments usually have direct access without authentication.
So you could give it a try, and then will probably run into authentication issues where the solution may require developments.
This is duplicate of https://discuss.eodc.eu/t/question-about-using-local-backend-with-multiple-stac-collections/681 not sure where to continue this discussion
This is duplicate of https://discuss.eodc.eu/t/question-about-using-local-backend-with-multiple-stac-collections/681 not sure where to continue this discussion
Hi, Leave this one on github, remove the other one instead.
ERROR 1
Hi, the collection you are using seems to be badly configured, and it's for sure not a STAC collection. We have an example of correct configurations here: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/creo_layercatalog.json
Hi. I replaced the creo_layercatalog.json file you gave me by renaming it, but when I then start the backend (python openeogeotrellis/deploy/local.py
) it gives error:
....
{"message": "_get_layer_catalog: catalog_files=['layercatalog.json']", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707381501.9347553, "filename": "layercatalog.py", "lineno": 791, "process": 8498, "req_id": "no-request", "user_id": null}
{"message": "_get_layer_catalog: reading layercatalog.json", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707381501.9347937, "filename": "layercatalog.py", "lineno": 793, "process": 8498, "req_id": "no-request", "user_id": null}
{"message": "Unhandled TypeError exception: TypeError('string indices must be integers')", "levelname": "ERROR", "name": "openeo_driver.util.logging", "created": 1707381501.9381285, "filename": "logging.py", "lineno": 231, "process": 8498, "exc_info": "Traceback (most recent call last):\n File \"openeogeotrellis/deploy/local.py\", line 92, in <module>\n backend_implementation = GeoPySparkBackendImplementation(\n File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 328, in __init__\n catalog = get_layer_catalog(vault)\n File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 884, in get_layer_catalog\n metadata = _get_layer_catalog(opensearch_enrich=opensearch_enrich)\n File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 794, in _get_layer_catalog\n metadata = dict_merge_recursive(metadata, read_catalog_file(path), overwrite=True)\n File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 789, in read_catalog_file\n return {coll[\"id\"]: coll for coll in read_json(catalog_file)}\n File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 789, in <dictcomp>\n return {coll[\"id\"]: coll for coll in read_json(catalog_file)}\nTypeError: string indices must be integers", "req_id": "no-request", "user_id": null}
The specific error occurs when attempting to access a string as if it were a dictionary. Specifically, the error occurs in the _get_layer_catalog
function in the layercatalog.py
file on line 789. The complete error is as follows:
TypeError: string indices must be integers
File "/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py", line 789, in <dictcomp>
return {coll["id"]: coll for coll in read_json(catalog_file)}
TypeError: string indices must be integers
error.ERROR 2
Leaving the layercatalog.json file unchanged so that the backend activates, connecting to the backend to use the processes:
connection = openeo.connect("http://localhost:8080/openeo/1.2/")
and loading the STAC separately:
url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
s2_cube = connection.load_stac(
url=url,
.....
It gives me this python error:
OpenEoApiError: [500] Internal: Server error: Exception during Spark execution: java.lang.ClassNotFoundException: geopyspark.geotools.kryo.ExpandedKryoRegistrator (ref: r-240208b2f1cf43aba717820f806590af)
and these errors in the terminal(backend):
....
{"message": "Using process 'load_stac' from namespace 'backend'.", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1707385426.816809, "filename": "ProcessGraphDeserializer.py", "lineno": 1585, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "load_stac from url 'https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a' with load params {'temporal_extent': ('2022-05-01', '2022-05-30'), 'spatial_extent': {'west': 8.891876254269395, 'south': 40.12025131034667, 'east': 9.009559726641157, 'north': 40.210341400436754, 'crs': 'EPSG:4326'}, 'global_extent': {'west': 8.891876254269395, 'south': 40.12025131034667, 'east': 9.009559726641157, 'north': 40.210341400436754, 'crs': 'EPSG:4326'}, 'bands': ['red', 'green', 'blue', 'nir'], 'properties': {}, 'aggregate_spatial_geometries': None, 'sar_backscatter': None, 'process_types': {<ProcessType.FOCAL_SPACE: 6>, <ProcessType.GLOBAL_TIME: 4>}, 'custom_mask': {}, 'data_mask': None, 'target_crs': None, 'target_resolution': None, 'resample_method': 'near', 'pixel_buffer': None}", "levelname": "INFO", "name": "openeogeotrellis.backend", "created": 1707385426.816958, "filename": "backend.py", "lineno": 765, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "STAC API request: GET https://earth-search.aws.element84.com/v1/search?limit=20&bbox=8.891876254269395%2C40.12025131034667%2C9.009559726641157%2C40.210341400436754&datetime=2022-05-01T00%3A00%3A00%2B00%3A00%2F2022-05-29T23%3A59%3A59.999000%2B00%3A00&collections=sentinel-2-l2a", "levelname": "INFO", "name": "openeogeotrellis.backend", "created": 1707385428.090134, "filename": "backend.py", "lineno": 916, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "exception chain classes: org.apache.spark.SparkException caused by java.lang.ClassNotFoundException", "levelname": "DEBUG", "name": "openeogeotrellis.backend", "created": 1707385452.3093026, "filename": "backend.py", "lineno": 1326, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "Py4JJavaError('An error occurred while calling o2343.datacube_seq.\\n', JavaObject id=o2350)", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1707385452.3095894, "filename": "views.py", "lineno": 278, "process": 14766, "exc_info": "Traceback (most recent call last):\n File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n rv = self.dispatch_request()\n File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n File \"/home/dio/openeo-python-driver/openeo_driver/users/auth.py\", line 88, in decorated\n return f(*args, **kwargs)\n File \"/home/dio/openeo-python-driver/openeo_driver/views.py\", line 655, in result\n result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 301, in evaluate\n return evaluate(process_graph=process_graph, env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 373, in evaluate\n result = convert_node(result_node, env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n return convert_node(processGraph['node'], env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n return convert_node(processGraph['node'], env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1539, in apply_process\n the_mask = convert_node(mask_node, env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n return convert_node(processGraph['node'], env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n return convert_node(processGraph['node'], env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n return convert_node(processGraph['node'], env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1590, in apply_process\n return process_function(args=ProcessArgs(args, process_id=process_id), env=env)\n File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 2199, in load_stac\n return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)\n File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 1090, in load_stac\n pyramid = pyramid_factory.datacube_seq(projected_polygons, from_date.isoformat(), to_date.isoformat(),\n File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/py4j/java_gateway.py\", line 1321, in __call__\n return_value = get_return_value(\n File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/py4j/protocol.py\", line 326, in get_return_value\n raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while calling o2343.datacube_seq.\n: org.apache.spark.SparkException: Failed to register classes with Kryo\n\tat org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:183)\n\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)\n\tat org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:233)\n\tat org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:171)\n\tat org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)\n\tat com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)\n\tat org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)\n\tat org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)\n\tat org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:385)\n\tat org.apache.spark.util.Utils$.clone(Utils.scala:1783)\n\tat org.apache.spark.rdd.RDD.$anonfun$aggregate$1(RDD.scala:1195)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:406)\n\tat org.apache.spark.rdd.RDD.aggregate(RDD.scala:1193)\n\tat org.apache.spark.rdd.RDD.$anonfun$countApproxDistinct$1(RDD.scala:1369)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:406)\n\tat org.apache.spark.rdd.RDD.countApproxDistinct(RDD.scala:1364)\n\tat org.apache.spark.rdd.RDD.$anonfun$countApproxDistinct$7(RDD.scala:1393)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:406)\n\tat org.apache.spark.rdd.RDD.countApproxDistinct(RDD.scala:1390)\n\tat org.openeo.geotrelliscommon.DatacubeSupport$.createPartitioner(DatacubeSupport.scala:154)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:971)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:128)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:91)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.ClassNotFoundException: geopyspark.geotools.kryo.ExpandedKryoRegistrator\n\tat java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)\n\tat java.base/java.lang.Class.forName0(Native Method)\n\tat java.base/java.lang.Class.forName(Class.java:398)\n\tat org.apache.spark.util.Utils$.classForName(Utils.scala:220)\n\tat org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$7(KryoSerializer.scala:178)\n\tat scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n\tat scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n\tat scala.collection.TraversableLike.map(TraversableLike.scala:286)\n\tat scala.collection.TraversableLike.map$(TraversableLike.scala:279)\n\tat scala.collection.AbstractTraversable.map(Traversable.scala:108)\n\tat org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:178)\n\t... 42 more\n", "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "127.0.0.1 - - [08/Feb/2024:10:44:12 +0100] \"POST /openeo/1.2/result HTTP/1.1\" 500 220 \"-\" \"openeo-python-client/0.27.0 cpython/3.10.13 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1707385452.315534, "filename": "glogging.py", "lineno": 363, "process": 14766, "req_id": "no-request", "user_id": null}
Questions
Hi,
before trying to dive into the details, can you maybe give a bit more context about your use case, especially why do you want to run an openeo-geopyspark-driver based openEO backend locally to also query it locally? This is quite a convoluted setup that normal openEO users should not be confronted with. There might be more easier ways to achieve what you want. E.g. the python client also has experimental "local processing" feature.
Hi,
before trying to dive into the details, can you maybe give a bit more context about your use case, especially why do you want to run an openeo-geopyspark-driver based openEO backend locally to also query it locally? This is quite a convoluted setup that normal openEO users should not be confronted with. There might be more easier ways to achieve what you want. E.g. the python client also has experimental "local processing" feature.
ok. I'll try to explain myself better. my goal is:
In this sense the openeo backend must interface with the openeo client, and therefore carry out all operations such as processes etc.
I hope I have explained myself well. If there is anything unclear, please ask.
@automataIA have a look at the local processing documentation and let us know if it helps: https://open-eo.github.io/openeo-python-client/cookbook/localprocessing.html
@automataIA have a look at the local processing documentation and let us know if it helps: https://open-eo.github.io/openeo-python-client/cookbook/localprocessing.html
Ty, but, as I wrote, I need that:
At Eurac Research we use those two components:
Java openEO driver for the REST API: https://github.com/Open-EO/openeo-spring-driver
Python computing engine (easily deployable with Docker, using the same implementation as the local processing): https://github.com/SARScripts/openeo_odc_driver/tree/dask_processes
The Java driver sends the process graph to the python component, which process the result and returns it. If you would like to have some help in setting that up, please get in touch with us and let us know your plans and information about yourself.
In Eurac Research utilizziamo questi due componenti:
Driver Java openEO per l'API REST: https://github.com/Open-EO/openeo-spring-driver
Motore di calcolo Python (facilmente distribuibile con Docker, utilizzando la stessa implementazione dell'elaborazione locale): https://github.com/SARScripts/openeo_odc_driver/tree/dask_processes
Il driver Java invia il grafico del processo al componente Python, che elabora il risultato e lo restituisce. Se desideri avere aiuto nella configurazione, contattaci e facci sapere i tuoi piani e le informazioni su di te.
Interessante. So, if I understand correctly, your backend is made up of these two components and that's it? (spero di si)
We also have Rasdaman and Open Datacube running for the data management, but if you plan to use load_stac
only with public catalogs they won't be necessary. Maybe open an issue here to continue the discussion: https://github.com/Open-EO/openeo-spring-driver/issues
We also have Rasdaman and Open Datacube running for the data management, but if you plan to use
load_stac
only with public catalogs they won't be necessary. Maybe open an issue here to continue the discussion: https://github.com/Open-EO/openeo-spring-driver/issues
Do you know a valid alternative to openeo-spring-driver but in its python version?
My goal is to download the data via the STAC collections, and via this backend, "process" it locally and then download/create it in .nc format. I'm using pypi's openeo 0.27.0 python client.
My code is:
And give me this error:
How do I solve this problem? In case I want, via this backend, how to do them correctly (without these errors):
INFO : win11, vs code, wsl2, ubuntu keep this message