Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

Problem with ElasticJobRegistry api_url in developer mode, run locally. #675

Closed automataIA closed 5 months ago

automataIA commented 5 months ago

My goal is to run this project locally, and then connect this backend via the local link to the openeo python client (pip version) via local url. Using a stac collection as a data source(ex. url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a").

I followed the instructions for it development:

git clone --recursive git@github.com:Open-EO/openeo-geopyspark-driver.git
git clone --recursive git@github.com:Open-EO/openeo-python-driver.git
git clone git@github.com:Open-EO/openeo-python-client.git
cd openeo-geopyspark-driver
python -m venv venv
source venv/bin/activate
cd ../openeo-python-client
pip install -e .
cd ../openeo-python-driver
pip install -e .[dev] --extra-index-url https://artifactory.vgt.vito.be/artifactory/api/pypi/python-openeo/simple
cd ../openeo-geopyspark-driver
pip install -e .[dev] --extra-index-url https://artifactory.vgt.vito.be/artifactory/api/pypi/python-openeo/simple

making these additions:

  1. When I go to clone the repositories, I replace git@github.com: with https://github.com/
  2. After cd openeo-geopyspark-driver , run python scripts/get-jars.py
  3. copy the layercatalog.json from tests directory to root directory

But after running:

export SPARK_HOME=$(find_spark_home.py)
export HADOOP_CONF_DIR=/etc/hadoop/conf
export FLASK_DEBUG=1

When I run python openeogeotrellis/deploy/local.py, it gives me this error (present in the last lines):

{"message": "Overriding sys.excepthook with <function _sys_excepthook at 0x7fd0c1d3d3a0> (was <built-in function excepthook>)", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1707208551.9927747, "filename": "logging.py", "lineno": 208, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "{'pid': 53221, 'interpreter': '/home/dio/openeo-geopyspark-driver/venv/bin/python', 'version': '3.8.18 (default, Feb  5 2024, 21:18:30) \\n[GCC 11.4.0]', 'argv': ['openeogeotrellis/deploy/local.py']}", "levelname": "INFO", "name": "__main__", "created": 1707208551.9928658, "filename": "local.py", "lineno": 78, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Creating Spark context with config:", "levelname": "INFO", "name": "__main__", "created": 1707208552.058966, "filename": "local.py", "lineno": 45, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.app.name': 'openeo-geotrellis-local'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0590599, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.master': 'local[2]'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0590909, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.ui.enabled': 'True'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0591152, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.serializer': 'org.apache.spark.serializer.KryoSerializer'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0591366, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.kryo.registrator': 'geopyspark.geotools.kryo.ExpandedKryoRegistrator'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0591567, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.jars': '/home/dio/openeo-geopyspark-driver/jars/openeo-logging-2.4.0_2.12-SNAPSHOT.jar,/home/dio/openeo-geopyspark-driver/jars/geotrellis-extensions-2.4.0_2.12-SNAPSHOT.jar'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0591764, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.driver.memory': '2G'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0591958, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.executor.memory': '2G'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0592144, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.kryoserializer.buffer.max': '1G'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0592325, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark config: 'spark.driver.extraJavaOptions': '-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5009'", "levelname": "INFO", "name": "__main__", "created": 1707208552.0592506, "filename": "local.py", "lineno": 47, "process": 53221, "req_id": "no-request", "user_id": null}
Listening for transport dt_socket at address: 5009
24/02/06 09:35:52 WARN Utils: Your hostname, PC-Fisso resolves to a loopback address: 127.0.1.1; using 172.30.18.91 instead (on interface eth0)
24/02/06 09:35:52 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
24/02/06 09:35:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
{"message": "Created Spark Context <SparkContext master=local[2] appName=openeo-geotrellis-local>", "levelname": "INFO", "name": "__main__", "created": 1707208554.904765, "filename": "local.py", "lineno": 49, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Spark web UI: http://localhost:4040/", "levelname": "INFO", "name": "__main__", "created": 1707208554.9098828, "filename": "local.py", "lineno": 50, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Registering fallback implementation of 'normalized_difference' by process graph (<openeo_driver.processes.ProcessRegistry object at 0x7fd0e5e0a0a0>)", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1707208555.0091393, "filename": "ProcessGraphDeserializer.py", "lineno": 239, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Registering fallback implementation of 'normalized_difference' by process graph (<openeo_driver.processes.ProcessRegistry object at 0x7fd0e5e0a100>)", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1707208555.0129898, "filename": "ProcessGraphDeserializer.py", "lineno": 239, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Creating new InMemoryServiceRegistry: <openeogeotrellis.service_registry.InMemoryServiceRegistry object at 0x7fd0bad7f280>", "levelname": "INFO", "name": "openeogeotrellis.service_registry", "created": 1707208555.2303329, "filename": "service_registry.py", "lineno": 76, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Loading configuration from Python file PosixPath('openeogeotrellis/deploy/local_config.py') (variable 'config')", "levelname": "DEBUG", "name": "openeo_driver.config.load", "created": 1707208555.2307968, "filename": "load.py", "lineno": 33, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Looking for geotrellis jars in search_locations=[PosixPath('jars'), PosixPath('/home/dio/openeo-geopyspark-driver/jars')]", "levelname": "DEBUG", "name": "openeogeotrellis.deploy", "created": 1707208555.2309744, "filename": "__init__.py", "lineno": 78, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Found geotrellis jars: [PosixPath('jars/geotrellis-extensions-2.4.0_2.12-SNAPSHOT.jar')]", "levelname": "DEBUG", "name": "openeogeotrellis.deploy", "created": 1707208555.2311745, "filename": "__init__.py", "lineno": 85, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Loaded config config_id='gps-local' from config_path='openeogeotrellis/deploy/local_config.py' (reason='lazy_load')", "levelname": "INFO", "name": "openeo_driver.config.load", "created": 1707208555.571127, "filename": "load.py", "lineno": 94, "process": 53221, "stack_info": "Stack (most recent call last):\n  File \"openeogeotrellis/deploy/local.py\", line 91, in <module>\n    app = build_app(backend_implementation=GeoPySparkBackendImplementation(use_zookeeper=False))\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 328, in __init__\n    catalog = get_layer_catalog(vault)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 884, in get_layer_catalog\n    metadata = _get_layer_catalog(opensearch_enrich=opensearch_enrich)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/layercatalog.py\", line 782, in _get_layer_catalog\n    opensearch_enrich = get_backend_config().opensearch_enrich\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/config/load.py\", line 14, in __call__\n    return self.get(force_reload=force_reload, show_stack=show_stack)\n  File \"/home/dio/openeo-python-driver/openeo_driver/config/load.py\", line 75, in get\n    self._config = self._load(reason=\"lazy_load\", show_stack=show_stack)\n  File \"/home/dio/openeo-python-driver/openeo_driver/config/load.py\", line 94, in _load\n    _log.info(f\"Loaded config {config_id=} from {config_path=} ({reason=})\", stack_info=show_stack)", "req_id": "no-request", "user_id": null}
{"message": "_get_layer_catalog: catalog_files=['layercatalog.json']", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707208555.5713131, "filename": "layercatalog.py", "lineno": 791, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "_get_layer_catalog: reading layercatalog.json", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707208555.5713542, "filename": "layercatalog.py", "lineno": 793, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "_get_layer_catalog: collected 10 collections", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707208555.5716228, "filename": "layercatalog.py", "lineno": 795, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "_get_layer_catalog: opensearch_enrich=False", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707208555.5716622, "filename": "layercatalog.py", "lineno": 797, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Creating merged collections for common names: {'SENTINEL2_L2A'}", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707208555.5716922, "filename": "layercatalog.py", "lineno": 918, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Merging SENTINEL2_L2A from ['TERRASCOPE_S2_TOC_V2', 'SENTINEL2_L2A_SENTINELHUB']", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1707208555.571724, "filename": "layercatalog.py", "lineno": 940, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "No elastic_job_registry given to GeoPySparkBackendImplementation, creating one", "levelname": "WARNING", "name": "openeogeotrellis.backend", "created": 1707208555.578443, "filename": "backend.py", "lineno": 338, "process": 53221, "req_id": "no-request", "user_id": null}
{"message": "Unhandled ValueError exception: ValueError(None)", "levelname": "ERROR", "name": "openeo_driver.util.logging", "created": 1707208555.5785315, "filename": "logging.py", "lineno": 231, "process": 53221, "exc_info": "Traceback (most recent call last):\n  File \"openeogeotrellis/deploy/local.py\", line 91, in <module>\n    app = build_app(backend_implementation=GeoPySparkBackendImplementation(use_zookeeper=False))\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 339, in __init__\n    elastic_job_registry = get_elastic_job_registry(requests_session)\n  File \"/home/dio/openeo-geopyspark-driver/openeogeotrellis/backend.py\", line 1628, in get_elastic_job_registry\n    job_registry = ElasticJobRegistry(\n  File \"/home/dio/openeo-python-driver/openeo_driver/jobregistry.py\", line 256, in __init__\n    raise ValueError(api_url)\nValueError: None", "req_id": "no-request", "user_id": null}

How can I solve this problem, if I'm working locally? INFO: win 11, vs code, wsl2, ubuntu, Python 3.8.18

soxofaan commented 5 months ago

I pushed a quickfix for local.py, can you try if you can get further now?

automataIA commented 5 months ago

I pushed a quickfix for local.py, can you try if you can get further now?

Solved. Ty.