Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

[500] Internal: Server error: java.io.IOException when a run download command. #680

Open automataIA opened 4 months ago

automataIA commented 4 months ago

My goal is to download the data via the STAC collections, and via this backend, "process" it locally and then download/create it in .nc format. I'm using pypi's openeo 0.27.0 python client.

My code is:

import openeo # !pip install openeo
import matplotlib.pyplot as plt  # !pip install ipympl
import xarray   #!pip install xarray netCDF4 h5netcdf      !pip install "xarray[complete]"
import pprint #pip install pprint , pprint36
import numpy as np

connection = openeo.connect("http://localhost:8080/openeo/1.2/")
connection.authenticate_oidc_device()

def calculate_square_coordinates(lat_centro, lon_centro, lato_km):
    # Costanti
    RAGGIO_TERRA_KM = 6371  # Raggio approssimativo della Terra in km
    LAT_KM = 111  # Distanza in km per 1 grado di latitudine

    # Calcolo della variazione di latitudine e longitudine
    delta_lat = (lato_km / 2) / LAT_KM
    delta_lon = (lato_km / 2) / (RAGGIO_TERRA_KM * np.cos(np.radians(lat_centro)) * np.pi / 180)

    # Calcolo delle coordinate del quadrato
    return lon_centro - delta_lon, lat_centro - delta_lat, lon_centro + delta_lon, lat_centro + delta_lat

# Esempio di utilizzo
lat_centro, lon_centro = 40.16529635539171, 8.950717990455276
lato_km = 10.0

west, south, east, north = calculate_square_coordinates(lat_centro, lon_centro, lato_km)

s2_cube = connection.load_collection(
    "SENTINEL2_L2A_CREO",
    temporal_extent=("2022-05-01", "2022-05-30"),
    spatial_extent={
        "west": west ,  # longitudine minimaa
        "south": south , # latitudine minima
        "east": east,  # longitudine massimama
        "north":  north, # latitudine massimama
        #"crs": "EPSG:4326",
    },
    bands=["B04", "B03", "B02", "SCL"],
    max_cloud_cover=100,
)

scl_band = s2_cube.band("SCL")
cloud_mask= (scl_band == 3) | (scl_band == 7) | (scl_band == 8) | (scl_band == 9) | (scl_band == 10) 

cloud_mask = cloud_mask.resample_cube_spatial(s2_cube)
cube_masked = s2_cube.mask(cloud_mask)
composite_masked = cube_masked.min_time()
composite_masked.download("composite-masked.nc")

ds = xarray.load_dataset( 'composite-masked.nc')
# Convert xarray DataSet to a (bands, x, y) DataArray
data = ds[["B04", "B03", "B02"]].to_array(dim="bands")

fig, ax = plt.subplots(ncols=1, figsize=(4, 4), dpi=90)
data.plot.imshow(vmin=0, vmax=2000, ax=ax);

And give me this error:

{
    "name": "OpenEoApiError",
    "message": "[500] Internal: Server error: java.io.IOException: Exception while evaluating catalog request https://finder.creodias.eu/api/collections/Sentinel2/search.json?box=8.891653321417749%2C40.12016403425433%2C9.009636046744415%2C40.21039972384912&sortParam=startDate&sortOrder=ascending&page=1&maxRecords=100&status=ONLINE&dataset=ESA-DATASET&cloudCover=%5B0%2C100%5D&startDate=2022-05-01T00%3A00%3A00Z&completionDate=2022-05-30T00%3A00%3A00Z:  (ref: r-2402069dc62a478a811db7770dea3da9)",
    "stack": "---------------------------------------------------------------------------
OpenEoApiError                            Traceback (most recent call last)
Cell In[20], line 50
     46 cube_masked = s2_cube.mask(cloud_mask)
     48 composite_masked = cube_masked.min_time()
---> 50 composite_masked.download('composite-masked.nc')
     52 ds = xarray.load_dataset( 'composite-masked.nc')
     53 # Convert xarray DataSet to a (bands, x, y) DataArray

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/datacube.py:2102, in DataCube.download(self, outputfile, format, options, validate)
   2100     format = guess_format(outputfile)
   2101 cube = self._ensure_save_result(format=format, options=options)
-> 2102 return self._connection.download(cube.flat_graph(), outputfile, validate=validate)

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/connection.py:1559, in Connection.download(self, graph, outputfile, timeout, validate)
   1557 pg_with_metadata = self._build_request_with_process_graph(process_graph=graph)
   1558 self._preflight_validation(pg_with_metadata=pg_with_metadata, validate=validate)
-> 1559 response = self.post(
   1560     path=\"/result\",
   1561     json=pg_with_metadata,
   1562     expected_status=200,
   1563     stream=True,
   1564     timeout=timeout or DEFAULT_TIMEOUT_SYNCHRONOUS_EXECUTE,
   1565 )
   1567 if outputfile is not None:
   1568     with Path(outputfile).open(mode=\"wb\") as f:

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/connection.py:230, in RestApiConnection.post(self, path, json, **kwargs)
    222 def post(self, path: str, json: Optional[dict] = None, **kwargs) -> Response:
    223     \"\"\"
    224     Do POST request to REST API.
    225 
   (...)
    228     :return: response: Response
    229     \"\"\"
--> 230     return self.request(\"post\", path=path, json=json, allow_redirects=False, **kwargs)

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/connection.py:769, in Connection.request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    762     return super(Connection, self).request(
    763         method=method, path=path, headers=headers, auth=auth,
    764         check_error=check_error, expected_status=expected_status, **kwargs,
    765     )
    767 try:
    768     # Initial request attempt
--> 769     return _request()
    770 except OpenEoApiError as api_exc:
    771     if api_exc.http_status_code in {401, 403} and api_exc.code == \"TokenInvalid\":
    772         # Auth token expired: can we refresh?

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/connection.py:762, in Connection.request.<locals>._request()
    761 def _request():
--> 762     return super(Connection, self).request(
    763         method=method, path=path, headers=headers, auth=auth,
    764         check_error=check_error, expected_status=expected_status, **kwargs,
    765     )

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/connection.py:168, in RestApiConnection.request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    166 expected_status = ensure_list(expected_status) if expected_status else []
    167 if check_error and status >= 400 and status not in expected_status:
--> 168     self._raise_api_error(resp)
    169 if expected_status and status not in expected_status:
    170     raise OpenEoRestError(\"Got status code {s!r} for `{m} {p}` (expected {e!r}) with body {body}\".format(
    171         m=method.upper(), p=path, s=status, e=expected_status, body=resp.text)
    172     )

File ~/.pyenv/versions/3.10.13/envs/venv/lib/python3.10/site-packages/openeo/rest/connection.py:188, in RestApiConnection._raise_api_error(self, response)
    186     error_message = info.get(\"message\")
    187     if error_code and isinstance(error_code, str) and error_message and isinstance(error_message, str):
--> 188         raise OpenEoApiError(
    189             http_status_code=status_code,
    190             code=error_code,
    191             message=error_message,
    192             id=info.get(\"id\"),
    193             url=info.get(\"url\"),
    194         )
    196 # Failed to parse it as a compliant openEO API error: show body as-is in the exception.
    197 text = response.text

OpenEoApiError: [500] Internal: Server error: java.io.IOException: Exception while evaluating catalog request https://finder.creodias.eu/api/collections/Sentinel2/search.json?box=8.891653321417749%2C40.12016403425433%2C9.009636046744415%2C40.21039972384912&sortParam=startDate&sortOrder=ascending&page=1&maxRecords=100&status=ONLINE&dataset=ESA-DATASET&cloudCover=%5B0%2C100%5D&startDate=2022-05-01T00%3A00%3A00Z&completionDate=2022-05-30T00%3A00%3A00Z:  (ref: r-2402069dc62a478a811db7770dea3da9)"
}

How do I solve this problem? In case I want, via this backend, how to do them correctly (without these errors):

  1. I want to authenticate to a specific STAC collection(e.g. copernicus), as I specify in my code
  2. Download raw images from the STAC collection
    1. "Process" it locally and then create the processed file in .nc format.

INFO : win11, vs code, wsl2, ubuntu keep this message

jdries commented 4 months ago

Hi, the collection you are using seems to be badly configured, and it's for sure not a STAC collection. We have an example of correct configurations here: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/creo_layercatalog.json

What you are looking for however, is to load data via the load_stac process rather than load_collection. This will work better if you work a proper STAC catalog that has public data.

For STAC catalogs that do require authentication, it gets more difficult, as there are different authentication mechanisms out there. We also don't have active developments there ourselves, as our current deployments usually have direct access without authentication.

So you could give it a try, and then will probably run into authentication issues where the solution may require developments.

soxofaan commented 4 months ago

This is duplicate of https://discuss.eodc.eu/t/question-about-using-local-backend-with-multiple-stac-collections/681 not sure where to continue this discussion

automataIA commented 4 months ago

This is duplicate of https://discuss.eodc.eu/t/question-about-using-local-backend-with-multiple-stac-collections/681 not sure where to continue this discussion

Hi, Leave this one on github, remove the other one instead.


ERROR 1

Hi, the collection you are using seems to be badly configured, and it's for sure not a STAC collection. We have an example of correct configurations here: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/creo_layercatalog.json

Hi. I replaced the creo_layercatalog.json file you gave me by renaming it, but when I then start the backend (python openeogeotrellis/deploy/local.py) it gives error:

....
{"message": "_get_layer_catalog: catalog_files=['layercatalog.json']", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707381501.9347553, "filename": "layercatalog.py", "lineno": 791, "process": 8498, "req_id": "no-request", "user_id": null}
{"message": "_get_layer_catalog: reading layercatalog.json", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707381501.9347937, "filename": "layercatalog.py", "lineno": 793, "process": 8498, "req_id": "no-request", "user_id": null}
{"message": "Unhandled TypeError exception: TypeError('string indices must be integers')", "levelname": "ERROR", "name": "openeo_driver.util.logging", "created": 1707381501.9381285, "filename": "logging.py", "lineno": 231, "process": 8498, "exc_info": "Traceback (most recent call last):\n  File \"openeogeotrellis/deploy/local.py\", line 92, in <module>\n    backend_implementation = GeoPySparkBackendImplementation(\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 328, in __init__\n    catalog = get_layer_catalog(vault)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 884, in get_layer_catalog\n    metadata = _get_layer_catalog(opensearch_enrich=opensearch_enrich)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 794, in _get_layer_catalog\n    metadata = dict_merge_recursive(metadata, read_catalog_file(path), overwrite=True)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 789, in read_catalog_file\n    return {coll[\"id\"]: coll for coll in read_json(catalog_file)}\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 789, in <dictcomp>\n    return {coll[\"id\"]: coll for coll in read_json(catalog_file)}\nTypeError: string indices must be integers", "req_id": "no-request", "user_id": null}

The specific error occurs when attempting to access a string as if it were a dictionary. Specifically, the error occurs in the _get_layer_catalog function in the layercatalog.py file on line 789. The complete error is as follows:

TypeError: string indices must be integers
File "/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py", line 789, in <dictcomp>
return {coll["id"]: coll for coll in read_json(catalog_file)}

This indicates that there is an attempt to access a string index as if it were a dictionary, but the index type is not an integer, causing the TypeError: string indices must be integers error.

ERROR 2 Leaving the layercatalog.json file unchanged so that the backend activates, connecting to the backend to use the processes: connection = openeo.connect("http://localhost:8080/openeo/1.2/") and loading the STAC separately:

url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
s2_cube = connection.load_stac(
    url=url, 
.....

It gives me this python error:

OpenEoApiError: [500] Internal: Server error: Exception during Spark execution: java.lang.ClassNotFoundException: geopyspark.geotools.kryo.ExpandedKryoRegistrator (ref: r-240208b2f1cf43aba717820f806590af)

and these errors in the terminal(backend):

....
{"message": "Using process 'load_stac' from namespace 'backend'.", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1707385426.816809, "filename": "ProcessGraphDeserializer.py", "lineno": 1585, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "load_stac from url 'https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a' with load params {'temporal_extent': ('2022-05-01', '2022-05-30'), 'spatial_extent': {'west': 8.891876254269395, 'south': 40.12025131034667, 'east': 9.009559726641157, 'north': 40.210341400436754, 'crs': 'EPSG:4326'}, 'global_extent': {'west': 8.891876254269395, 'south': 40.12025131034667, 'east': 9.009559726641157, 'north': 40.210341400436754, 'crs': 'EPSG:4326'}, 'bands': ['red', 'green', 'blue', 'nir'], 'properties': {}, 'aggregate_spatial_geometries': None, 'sar_backscatter': None, 'process_types': {<ProcessType.FOCAL_SPACE: 6>, <ProcessType.GLOBAL_TIME: 4>}, 'custom_mask': {}, 'data_mask': None, 'target_crs': None, 'target_resolution': None, 'resample_method': 'near', 'pixel_buffer': None}", "levelname": "INFO", "name": "openeogeotrellis.backend", "created": 1707385426.816958, "filename": "backend.py", "lineno": 765, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "STAC API request: GET https://earth-search.aws.element84.com/v1/search?limit=20&bbox=8.891876254269395%2C40.12025131034667%2C9.009559726641157%2C40.210341400436754&datetime=2022-05-01T00%3A00%3A00%2B00%3A00%2F2022-05-29T23%3A59%3A59.999000%2B00%3A00&collections=sentinel-2-l2a", "levelname": "INFO", "name": "openeogeotrellis.backend", "created": 1707385428.090134, "filename": "backend.py", "lineno": 916, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "exception chain classes: org.apache.spark.SparkException caused by java.lang.ClassNotFoundException", "levelname": "DEBUG", "name": "openeogeotrellis.backend", "created": 1707385452.3093026, "filename": "backend.py", "lineno": 1326, "process": 14766, "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "Py4JJavaError('An error occurred while calling o2343.datacube_seq.\\n', JavaObject id=o2350)", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1707385452.3095894, "filename": "views.py", "lineno": 278, "process": 14766, "exc_info": "Traceback (most recent call last):\n  File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n    rv = self.dispatch_request()\n  File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n  File \"/home/dio/openeo-python-driver/openeo_driver/users/auth.py\", line 88, in decorated\n    return f(*args, **kwargs)\n  File \"/home/dio/openeo-python-driver/openeo_driver/views.py\", line 655, in result\n    result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 301, in evaluate\n    return evaluate(process_graph=process_graph, env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 373, in evaluate\n    result = convert_node(result_node, env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1539, in apply_process\n    the_mask = convert_node(mask_node, env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in apply_process\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1558, in <dictcomp>\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 412, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 398, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 1590, in apply_process\n    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)\n  File \"/home/dio/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py\", line 2199, in load_stac\n    return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 1090, in load_stac\n    pyramid = pyramid_factory.datacube_seq(projected_polygons, from_date.isoformat(), to_date.isoformat(),\n  File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/py4j/java_gateway.py\", line 1321, in __call__\n    return_value = get_return_value(\n  File \"/home/dio/openeo-geopyspark-driver/venv/lib/python3.8/site-packages/py4j/protocol.py\", line 326, in get_return_value\n    raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while calling o2343.datacube_seq.\n: org.apache.spark.SparkException: Failed to register classes with Kryo\n\tat org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:183)\n\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)\n\tat org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:233)\n\tat org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:171)\n\tat org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)\n\tat com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)\n\tat org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)\n\tat org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)\n\tat org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:385)\n\tat org.apache.spark.util.Utils$.clone(Utils.scala:1783)\n\tat org.apache.spark.rdd.RDD.$anonfun$aggregate$1(RDD.scala:1195)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:406)\n\tat org.apache.spark.rdd.RDD.aggregate(RDD.scala:1193)\n\tat org.apache.spark.rdd.RDD.$anonfun$countApproxDistinct$1(RDD.scala:1369)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:406)\n\tat org.apache.spark.rdd.RDD.countApproxDistinct(RDD.scala:1364)\n\tat org.apache.spark.rdd.RDD.$anonfun$countApproxDistinct$7(RDD.scala:1393)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:406)\n\tat org.apache.spark.rdd.RDD.countApproxDistinct(RDD.scala:1390)\n\tat org.openeo.geotrelliscommon.DatacubeSupport$.createPartitioner(DatacubeSupport.scala:154)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:971)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:128)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:91)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.ClassNotFoundException: geopyspark.geotools.kryo.ExpandedKryoRegistrator\n\tat java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)\n\tat java.base/java.lang.Class.forName0(Native Method)\n\tat java.base/java.lang.Class.forName(Class.java:398)\n\tat org.apache.spark.util.Utils$.classForName(Utils.scala:220)\n\tat org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$7(KryoSerializer.scala:178)\n\tat scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n\tat scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n\tat scala.collection.TraversableLike.map(TraversableLike.scala:286)\n\tat scala.collection.TraversableLike.map$(TraversableLike.scala:279)\n\tat scala.collection.AbstractTraversable.map(Traversable.scala:108)\n\tat org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:178)\n\t... 42 more\n", "req_id": "r-24020870aa914599af5ce7c5ac6befb9", "user_id": "22615bb01984f64a614f19cdfe73de14250148c287c72deed89ed1ec73040149@egi.eu"}
{"message": "127.0.0.1 - - [08/Feb/2024:10:44:12 +0100] \"POST /openeo/1.2/result HTTP/1.1\" 500 220 \"-\" \"openeo-python-client/0.27.0 cpython/3.10.13 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1707385452.315534, "filename": "glogging.py", "lineno": 363, "process": 14766, "req_id": "no-request", "user_id": null}

Questions

  1. Did I do something wrong? Is there an error in the file?
soxofaan commented 4 months ago

Hi,

before trying to dive into the details, can you maybe give a bit more context about your use case, especially why do you want to run an openeo-geopyspark-driver based openEO backend locally to also query it locally? This is quite a convoluted setup that normal openEO users should not be confronted with. There might be more easier ways to achieve what you want. E.g. the python client also has experimental "local processing" feature.

automataIA commented 4 months ago

Hi,

before trying to dive into the details, can you maybe give a bit more context about your use case, especially why do you want to run an openeo-geopyspark-driver based openEO backend locally to also query it locally? This is quite a convoluted setup that normal openEO users should not be confronted with. There might be more easier ways to achieve what you want. E.g. the python client also has experimental "local processing" feature.

ok. I'll try to explain myself better. my goal is:

  1. Have a backend deployed in a server (now I'm local because I'm in the development phase)
  2. That uses STAC collections/catalogues as a data source
  3. The backend part on the server should only be responsible for doing all the data processing
  4. To then possibly give the result back to the user
  5. In this sense the openeo backend must interface with the openeo client, and therefore carry out all operations such as processes etc.

    I hope I have explained myself well. If there is anything unclear, please ask.

clausmichele commented 4 months ago

@automataIA have a look at the local processing documentation and let us know if it helps: https://open-eo.github.io/openeo-python-client/cookbook/localprocessing.html

automataIA commented 4 months ago

@automataIA have a look at the local processing documentation and let us know if it helps: https://open-eo.github.io/openeo-python-client/cookbook/localprocessing.html

Ty, but, as I wrote, I need that:

  1. run processes in my backend(server)
  2. not client-side (definitely useful for a normal user)
  3. Using a STAC collection as a data source
clausmichele commented 4 months ago

At Eurac Research we use those two components:

Java openEO driver for the REST API: https://github.com/Open-EO/openeo-spring-driver

Python computing engine (easily deployable with Docker, using the same implementation as the local processing): https://github.com/SARScripts/openeo_odc_driver/tree/dask_processes

The Java driver sends the process graph to the python component, which process the result and returns it. If you would like to have some help in setting that up, please get in touch with us and let us know your plans and information about yourself.

automataIA commented 4 months ago

In Eurac Research utilizziamo questi due componenti:

Driver Java openEO per l'API REST: https://github.com/Open-EO/openeo-spring-driver

Motore di calcolo Python (facilmente distribuibile con Docker, utilizzando la stessa implementazione dell'elaborazione locale): https://github.com/SARScripts/openeo_odc_driver/tree/dask_processes

Il driver Java invia il grafico del processo al componente Python, che elabora il risultato e lo restituisce. Se desideri avere aiuto nella configurazione, contattaci e facci sapere i tuoi piani e le informazioni su di te.

Interessante. So, if I understand correctly, your backend is made up of these two components and that's it? (spero di si)

clausmichele commented 4 months ago

We also have Rasdaman and Open Datacube running for the data management, but if you plan to use load_stac only with public catalogs they won't be necessary. Maybe open an issue here to continue the discussion: https://github.com/Open-EO/openeo-spring-driver/issues

automataIA commented 4 months ago

We also have Rasdaman and Open Datacube running for the data management, but if you plan to use load_stac only with public catalogs they won't be necessary. Maybe open an issue here to continue the discussion: https://github.com/Open-EO/openeo-spring-driver/issues

Do you know a valid alternative to openeo-spring-driver but in its python version?