c-scale-community / use-case-aquamonitor

Apache License 2.0
2 stars 1 forks source link

upgrade openEO on INCD #26

Closed jdries closed 1 year ago

jdries commented 2 years ago

Upgrade openEO to latest version. @zbenta We updated our documentation to reflect latest upgrades to software stack: https://github.com/Open-EO/openeo-geotrellis-kubernetes/commit/a8147962d49555d366b00f64f7b27ff62c5f712e

Can you try again based on those instructions, and report issues here? @tcassaert is available to follow up more quickly (than I can)!

jdries commented 1 year ago

Hi @cesarpferreira , great that means it now only fails to download the result, you could perhaps check the openEO bucket in your object storage to see that something ended up in there. For more info on the typeerror: you can probably find the stack trace in the main driver logs, I can tell you more if I see that.

cesarpferreira commented 1 year ago

We checked our object storage and indeed inside the bucket OpenEO-data there are some .tiff files inside a folder with the same name as the job. Are we doing something wrong for downloading those files?

The logs for main driver have this line with the error:

"TypeError('can only concatenate str (not \"NoneType\") to str')", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1674650996.183127, "filename": "views.py", "lineno": 270, "process": 122, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n    rv = self.dispatch_request()\n  File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py\", line 88, in decorated\n    return f(*args, **kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 869, in list_job_results\n    return _list_job_results(job_id, user.user_id)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 914, in _list_job_results\n    \"href\": job_results_canonical_url(),\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 895, in job_results_canonical_url\n    secure_key = signer.sign_job_results(\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/urlsigning.py\", line 47, in sign_job_results\n    return self._sign_with_secret(job_id, user_id, str(expires))\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/urlsigning.py\", line 73, in _sign_with_secret\n    token_key = reduce(operator.add, list(token_key_parts) + [self._secret], \"\")\nTypeError: can only concatenate str (not \"NoneType\") to str", "req_id": "r-9e245fbf18f44addaedca9bce39547e5", "user_id": "cd699ad346138df0ae05cd580df3c01c2c744b5714d71f6cac72a94b6a55399f@egi.eu"}
jdries commented 1 year ago

Can you add an environment variable 'SIGNED_URL_SECRET' to the driver? The value can be a self chosen secret string, it is used for signing url's for download. I have the impression that this is missing.

cesarpferreira commented 1 year ago

Hi @jdries , we added the env var to the driver like this SIGNED_URL_SECRET: 'OUR_SECRET'. Now the job state is stuck in the jupyter notebook like this, even if the job finishes successfully:

image

image

This only happens with the env var SIGNED_URL_SECRET defined.

Also, we noticed that in job_metadata.json, the links for the images are wrongly created: s3://OpenEO-datas3:/OpenEO-data/batch_jobs/j-e0969d62ea124821832aef2db881c34f/openEO_2020-12-12Z.tif

jdries commented 1 year ago

I fail to see the link between that env var and the jobtracker. Do job tracker logs provide any hint?

I do recognize that issue with bad s3 locations, but that should be fixed, could you try running with netCDF as output format?

cesarpferreira commented 1 year ago

The logs for job tracker are always the same, we never found any logs for job in this pod logs:

INFO:__main__:ConfigParams(): {'async_task_handler_environment': None,
 'batch_job_output_root': PosixPath('/batch_jobs'),
 'batch_jobs_zookeeper_root_path': '/openeo/jobs',
 'cache_shub_batch_results': False,
 'default_opensearch_endpoint': 'https://services.terrascope.be/catalogue',
 'is_ci_context': False,
 'is_kube_deploy': 'true',
 'layer_catalog_metadata_files': ['layercatalog.json'],
 's1backscatter_elev_geoid': None,
 's3_bucket_name': 'OpenEO-data',
 'zookeepernodes': ['zookeeper-cscale.zookeeper.svc.cluster.local:2181']}
{"message": "ConfigParams(): {'async_task_handler_environment': None,\n 'batch_job_output_root': PosixPath('/batch_jobs'),\n 'batch_jobs_zookeeper_root_path': '/openeo/jobs',\n 'cache_shub_batch_results': False,\n 'default_opensearch_endpoint': 'https://services.terrascope.be/catalogue',\n 'is_ci_context': False,\n 'is_kube_deploy': 'true',\n 'layer_catalog_metadata_files': ['layercatalog.json'],\n 's1backscatter_elev_geoid': None,\n 's3_bucket_name': 'OpenEO-data',\n 'zookeepernodes': ['zookeeper-cscale.zookeeper.svc.cluster.local:2181']}", "levelname": "INFO", "name": "__main__", "created": 1674746118.4327784, "filename": "job_tracker.py", "lineno": 339, "process": 1}
INFO:openeo_driver.jobregistry.elastic:Creating ElasticJobRegistry with backend_id='undefined' and api_url='https://jobregistry.openeo.vito.be'
{"message": "Creating ElasticJobRegistry with backend_id='undefined' and api_url='https://jobregistry.openeo.vito.be'", "levelname": "INFO", "name": "openeo_driver.jobregistry.elastic", "created": 1674746118.4341402, "filename": "jobregistry.py", "lineno": 58, "process": 1}
WARNING:openeo_driver.jobregistry.elastic:In context 'init GpsBatchJobs': caught EjrError("Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set")
{"message": "In context 'init GpsBatchJobs': caught EjrError(\"Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set\")", "levelname": "WARNING", "name": "openeo_driver.jobregistry.elastic", "created": 1674746118.4343078, "filename": "logging.py", "lineno": 319, "process": 1, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/util/logging.py\", line 317, in just_log_exceptions\n    yield\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 938, in get_or_build_elastic_job_registry\n    job_registry = ElasticJobRegistry.from_environ()\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/jobregistry.py\", line 107, in from_environ\n    raise EjrError(\"Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set\")\nopeneo_driver.jobregistry.EjrError: Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set"}
Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/util/logging.py", line 317, in just_log_exceptions
    yield
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py", line 938, in get_or_build_elastic_job_registry
    job_registry = ElasticJobRegistry.from_environ()
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/jobregistry.py", line 107, in from_environ
    raise EjrError("Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set")
openeo_driver.jobregistry.EjrError: Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set
INFO:openeo_driver.jobregistry.elastic:Creating ElasticJobRegistry with backend_id='undefined' and api_url='https://jobregistry.openeo.vito.be'
WARNING:openeo_driver.jobregistry.elastic:In context 'init JobTracker': caught EjrError("Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set")
Traceback (most recent call last):
{"message": "Creating ElasticJobRegistry with backend_id='undefined' and api_url='https://jobregistry.openeo.vito.be'", "levelname": "INFO", "name": "openeo_driver.jobregistry.elastic", "created": 1674746118.4355388, "filename": "jobregistry.py", "lineno": 58, "process": 1}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/util/logging.py", line 317, in just_log_exceptions
    yield
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py", line 938, in get_or_build_elastic_job_registry
    job_registry = ElasticJobRegistry.from_environ()
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/jobregistry.py", line 107, in from_environ
    raise EjrError("Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set")
openeo_driver.jobregistry.EjrError: Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set
{"message": "In context 'init JobTracker': caught EjrError(\"Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set\")", "levelname": "WARNING", "name": "openeo_driver.jobregistry.elastic", "created": 1674746118.4356606, "filename": "logging.py", "lineno": 319, "process": 1, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/util/logging.py\", line 317, in just_log_exceptions\n    yield\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 938, in get_or_build_elastic_job_registry\n    job_registry = ElasticJobRegistry.from_environ()\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/jobregistry.py\", line 107, in from_environ\n    raise EjrError(\"Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set\")\nopeneo_driver.jobregistry.EjrError: Env var 'OPENEO_EJR_OIDC_CLIENT_SECRET' must be set"}
cesarpferreira commented 1 year ago

Hi everyone, Good news, all of the sudden our test started working for netcdf format and now it downloads the results. For gtiff format, the step results.download_files(outdir) gives the following error. It looks like it happens because of the storage url bad formating referenced in this comment https://github.com/c-scale-community/use-case-aquamonitor/issues/26#issuecomment-1404796562

---------------------------------------------------------------------------
OpenEoApiError                            Traceback (most recent call last)
/tmp/ipykernel_135/1991732849.py in <module>
----> 1 results.download_files(Path("./output") / "incd_Result")

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in download_files(self, target, include_stac_metadata)
    412         ensure_dir(target)
    413 
--> 414         downloaded = [a.download(target) for a in self.get_assets()]
    415 
    416         if include_stac_metadata:

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in <listcomp>(.0)
    412         ensure_dir(target)
    413 
--> 414         downloaded = [a.download(target) for a in self.get_assets()]
    415 
    416         if include_stac_metadata:

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in download(self, target, chunk_size)
    286         logger.info("Downloading Job result asset {n!r} from {h!s} to {t!s}".format(n=self.name, h=self.href, t=target))
    287         with target.open("wb") as f:
--> 288             response = self._get_response(stream=True)
    289             for block in response.iter_content(chunk_size=chunk_size):
    290                 f.write(block)

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in _get_response(self, stream)
    292 
    293     def _get_response(self, stream=True) -> requests.Response:
--> 294         return self.job.connection.get(self.href, stream=stream)
    295 
    296     def load_json(self) -> dict:

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in get(self, path, stream, auth, **kwargs)
    161         :return: response: Response
    162         """
--> 163         return self.request("get", path=path, stream=stream, auth=auth, **kwargs)
    164 
    165     def post(self, path, json: dict = None, **kwargs) -> Response:

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    596         try:
    597             # Initial request attempt
--> 598             return _request()
    599         except OpenEoApiError as api_exc:
    600             if api_exc.http_status_code == 403 and api_exc.code == "TokenInvalid":

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in _request()
    589         # Do request, but with retry when access token has expired and refresh token is available.
    590         def _request():
--> 591             return super(Connection, self).request(
    592                 method=method, path=path, headers=headers, auth=auth,
    593                 check_error=check_error, expected_status=expected_status, **kwargs,

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    119         expected_status = ensure_list(expected_status) if expected_status else []
    120         if check_error and status >= 400 and status not in expected_status:
--> 121             self._raise_api_error(resp)
    122         if expected_status and status not in expected_status:
    123             raise OpenEoRestError("Got status code {s!r} for `{m} {p}` (expected {e!r})".format(

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in _raise_api_error(self, response)
    150             else:
    151                 exception = OpenEoApiError(http_status_code=status_code, message=text)
--> 152         raise exception
    153 
    154     def get(self, path, stream=False, auth: AuthBase = None, **kwargs) -> Response:

OpenEoApiError: [500] Internal: Server error: ParamValidationError('Parameter validation failed:\nInvalid bucket name "OpenEO-datas3:": Bucket name must match the regex "^[a-zA-Z0-9.\\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\\-]{1,63}$"') (ref: r-04ebb9ae102849c181b41bdf82b4ce73)
jdries commented 1 year ago

Great, looks like we're almost there! I spotted where the issue with the wrong s3 uri happens, may already have a fix committed.

Jaapel commented 1 year ago

Is that change live?

I am just checking out the notebook with the new version and get the same exception:

Server error: ParamValidationError('Parameter validation failed:\\nInvalid bucket name "OpenEO-datas3:": Bucket name must match the regex "^[a-zA-Z0-9.\\\\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\\\\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\\\\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\\\\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\\\\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\\\\-]{1,63}$"')
cesarpferreira commented 1 year ago

Hi @Jaapel

From our side, we updated now the docker image to the latest version (20230228-364), please try again to check if that error persists.

Jaapel commented 1 year ago

It seems to persist still. Is there an action I should take to make sure I am using the latest image? I am using this url for the backend connection: https://openeo.a.incd.pt/openeo/1.1.0 This exception only happens when I try to load data from a url (e.g. xr.open_dataset if you are familliar with xarray, it uses rasterio as backend). When using the OpenEO client to load data, this problem does not occur. Any ideas why this is the case?

jdries commented 1 year ago

@JeroenVerstraelen can you help out with this one? The last relevant commit was: https://github.com/Open-EO/openeo-geotrellis-extensions/commit/81aea41c6b2fe5d896332a0af26279193681ab7f

jdries commented 1 year ago

Update for this last issue: I may have found a potential cause, but needs rollout and testing. I assume you're not blocked as other format options work fine.

Jaapel commented 1 year ago

The rest of the notebook seemed to work fine, just this part. So for testing purposes we are fine! For testing performance, is it clear how we can measure CPU / Memory usage for a single workflow?

Jaapel commented 1 year ago

@jdries @zbenta any updates on the above? I'd like to start reporting some metrics on performance

zbenta commented 1 year ago

We have no idea on how to go about it, we tried using kubecost but it dinn't work, maybe we can try grafana and prometheus.

sebastian-luna-valero commented 1 year ago

See also: https://github.com/c-scale-community/use-case-aquamonitor/issues/30

tcassaert commented 1 year ago

@zbenta what did not work with kubecost? You didn't get it to run, or you didn't get the correct info out of it?

We're currently running kubecost ourself, but a pretty old version. It is rebranded as opencost but I still have to update it on our deployment.

zbenta commented 1 year ago

@zbenta what did not work with kubecost? You didn't get it to run, or you didn't get the correct info out of it?

We're currently running kubecost ourself, but a pretty old version. It is rebranded as opencost but I still have to update it on our deployment.

The result it gave were misleading, we were earning money instead of spending it.

zbenta commented 1 year ago

We tried accessing our endpoint after having a issue with the image we were using (it suddenttly dissapeared from the catalog). We updated the deployment with image(https://vito-docker.artifactory.vgt.vito.be/webapp/#/packages/docker/openeo-geotrellis-kube/20230314-377/?state=eyJyZXBvIjoidml0by1kb2NrZXIiLCJxdWVyeSI6eyJwa2ciOiJvcGVuZW8ifX0%3D)

And now are having this issue:

image

soxofaan commented 1 year ago

in the authenticate_oidc call, can you try the following?

con.authenticate_oidc(provider_id="egi", client_id="vito-default-client")

that's the current workaround for a known issue (https://github.com/Open-EO/openeo-geopyspark-driver/issues/344)

zbenta commented 1 year ago

in the authenticate_oidc call, can you try the following?

con.authenticate_oidc(provider_id="egi", client_id="vito-default-client")

that's the current workaround for a known issue (Open-EO/openeo-geopyspark-driver#344)

Thanks @soxofaan it worked like a charm.

sebastian-luna-valero commented 1 year ago

Hi,

We would like to redeploy openEO at CESNET for WaterWatch using the TOSCA template via PaaS Orchestrator.

@zbenta could we consider your deployment recipe ready for openEO backend at INCD?

zbenta commented 1 year ago

Hi,

We would like to redeploy openEO at CESNET for WaterWatch using the TOSCA template via PaaS Orchestrator.

@zbenta could we consider your deployment recipe ready for openEO backend at INCD?

The guys at INFN are using Tosca templates, we are using kubernetes at INCD.

sebastian-luna-valero commented 1 year ago

Correct.

Sorry, a bit of more context. I think INFN has based the TOSCA template on your deployment recipe. The TOSCA template also deploys openEO on top of a virtual k8s.

I think we are at a point where your recipe is working, so I wanted to ask @maricaantonacci to update the TOSCA template.

The existing TOSCA template wasn't working for me (https://ggus.eu/?mode=ticket_info&ticket_id=159319) and I would like to get this working to deploy openEO at CESNET for WaterWatch.

zbenta commented 1 year ago

Correct.

Sorry, a bit of more context. I think INFN has based the TOSCA template on your deployment recipe. The TOSCA template also deploys openEO on top of a virtual k8s.

I think we are at a point where your recipe is working, so I wanted to ask @maricaantonacci to update the TOSCA template.

The existing TOSCA template wasn't working for me (https://ggus.eu/?mode=ticket_info&ticket_id=159319) and I would like to get this working to deploy openEO at CESNET for WaterWatch.

We have updated our repository and added the zookeeper instalation part, you can take a look at our public repo for the recipe.

sebastian-luna-valero commented 1 year ago

Thanks, @zbenta

I have now updated the C-SCALE wiki: https://wiki.c-scale.eu/C-SCALE/c-scale-providers/getting-started#how-to-deploy-openeo-platform-back-end

and I will report back via https://ggus.eu/?mode=ticket_info&ticket_id=159319

zbenta commented 1 year ago

Thanks, @zbenta

I have now updated the C-SCALE wiki: https://wiki.c-scale.eu/C-SCALE/c-scale-providers/getting-started#how-to-deploy-openeo-platform-back-end

and I will report back via https://ggus.eu/?mode=ticket_info&ticket_id=159319

@sebastian-luna-valero, just refer this repo on the wiki, this is the most updated one, it has all the needed components to setup the openeo service, the other one was very old and we ended up removing it.

zbenta commented 1 year ago

Could anyone explain us how can we get the latest image from vito, because when we use the latest tag, kubernetes doesn't pick up the latest image, since it believes every node has already the latest image. By the way, is there any policy in vito stating the you guys can only have the last 10 images available? Any thoughts on this subject?

tcassaert commented 1 year ago

You can define imagePullPolicy: Always.

@jdries I think we can keep more images if necessary?

zbenta commented 1 year ago

@tcassaert thanks for the tip

cesarpferreira commented 1 year ago

Hi everyone,

We deployed a resto endpoint in our infrastructure and updated our configs to use that endpoint. We tried to run a job and got this error:

OpenEO batch job failed: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-6cdf505d3dea4366bff3d2eacd3cd264_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  
Traceback (most recent call last):
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 354, in main
    run_driver()
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 325, in run_driver
    run_job(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 53, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 401, in run_job
    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 339, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 359, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1567, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1567, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 371, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 359, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1662, in apply_process
    return process_function(args=args, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 595, in load_collection
    return env.backend_implementation.catalog.load_collection(collection_id, load_params=load_params, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo/util.py", line 382, in wrapper
    return f(*args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 115, in load_collection
    return self._load_collection_cached(collection_id, load_params, WhiteListEvalEnv(env,WHITELIST))
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 594, in _load_collection_cached
    pyramid = file_s2_pyramid()
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 303, in file_s2_pyramid
    return file_pyramid(pyramid_factory)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 369, in file_pyramid
    return create_pyramid(factory)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 329, in create_pyramid
    return factory.datacube_seq(
  File "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1231.datacube_seq.
: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-6cdf505d3dea4366bff3d2eacd3cd264_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  
    at org.openeo.opensearch.OpenSearchClient.execute(OpenSearchClient.scala:125)
    at org.openeo.opensearch.backends.OscarsClient.$anonfun$getProductsFromPage$5(OscarsClient.scala:90)
    at org.openeo.opensearch.OpenSearchClient.attempt$1(OpenSearchClient.scala:162)
    at org.openeo.opensearch.OpenSearchClient.withRetries(OpenSearchClient.scala:171)
    at org.openeo.opensearch.backends.OscarsClient.getProductsFromPage(OscarsClient.scala:90)
    at org.openeo.opensearch.backends.OscarsClient.from$1(OscarsClient.scala:47)
    at org.openeo.opensearch.backends.OscarsClient.getProducts(OscarsClient.scala:51)
    at org.openeo.opensearch.OpenSearchClient.getProducts(OpenSearchClient.scala:78)
    at org.openeo.geotrellis.layers.FileLayerProvider.loadRasterSourceRDD(FileLayerProvider.scala:961)
    at org.openeo.geotrellis.layers.FileLayerProvider.readKeysToRasterSources(FileLayerProvider.scala:609)
    at org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:788)
    at org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:111)
    at org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:84)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:829)

{"message": "OpenEO batch job failed: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-6cdf505d3dea4366bff3d2eacd3cd264_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  ", "levelname": "ERROR", "name": "openeo-user-log", "created": 1679590335.0969262, "filename": "batch_job.py", "lineno": 358, "process": 74, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 354, in main\n    run_driver()\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 325, in run_driver\n    run_job(\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py\", line 53, in memory_logging_wrapper\n    return function(*args, **kwargs)\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 401, in run_job\n    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 339, in evaluate\n    result = convert_node(result_node, env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 359, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 1567, in apply_process\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 1567, in <dictcomp>\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 371, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 359, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 1662, in apply_process\n    return process_function(args=args, env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 595, in load_collection\n    return env.backend_implementation.catalog.load_collection(collection_id, load_params=load_params, env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo/util.py\", line 382, in wrapper\n    return f(*args, **kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 115, in load_collection\n    return self._load_collection_cached(collection_id, load_params, WhiteListEvalEnv(env,WHITELIST))\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 594, in _load_collection_cached\n    pyramid = file_s2_pyramid()\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 303, in file_s2_pyramid\n    return file_pyramid(pyramid_factory)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 369, in file_pyramid\n    return create_pyramid(factory)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 329, in create_pyramid\n    return factory.datacube_seq(\n  File \"/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py\", line 1321, in __call__\n    return_value = get_return_value(\n  File \"/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py\", line 326, in get_return_value\n    raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while calling o1231.datacube_seq.\n: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-6cdf505d3dea4366bff3d2eacd3cd264_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  \n\tat org.openeo.opensearch.OpenSearchClient.execute(OpenSearchClient.scala:125)\n\tat org.openeo.opensearch.backends.OscarsClient.$anonfun$getProductsFromPage$5(OscarsClient.scala:90)\n\tat org.openeo.opensearch.OpenSearchClient.attempt$1(OpenSearchClient.scala:162)\n\tat org.openeo.opensearch.OpenSearchClient.withRetries(OpenSearchClient.scala:171)\n\tat org.openeo.opensearch.backends.OscarsClient.getProductsFromPage(OscarsClient.scala:90)\n\tat org.openeo.opensearch.backends.OscarsClient.from$1(OscarsClient.scala:47)\n\tat org.openeo.opensearch.backends.OscarsClient.getProducts(OscarsClient.scala:51)\n\tat org.openeo.opensearch.OpenSearchClient.getProducts(OpenSearchClient.scala:78)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.loadRasterSourceRDD(FileLayerProvider.scala:961)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.readKeysToRasterSources(FileLayerProvider.scala:609)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:788)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:111)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:84)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n", "job_id": "j-5cc8ba69b85e4622b5f36995f0071de7", "user_id": "cd699ad346138df0ae05cd580df3c01c2c744b5714d71f6cac72a94b6a55399f@egi.eu"}

Our resto endpoint is https://openeo.a.incd.pt/resto. Any ideas what it could be? In the file layercatalog.json, at the end, there the following section related to opensearch:

...
"_vito": {
          "data_source": {
            "type": "file-s2",
            "opensearch_collection_id": "S2",
            "opensearch_endpoint": "https://openeo.a.incd.pt/resto",
            "provider:backend": "incd"
          }
        }
      }
    ]

Do we need to deploy opensearch in our cluster?

jdries commented 1 year ago

I believe the problem is indeed our catalog client failing to figure out if it's opensearch or stac, and defaulting to opensearch. We'll want to log a bug for that and fix it. Shouldn't be too hard because we already did some work towards making the catalog client more configurable.

zbenta commented 1 year ago

Any news on the bug fix @jdries?

jdries commented 1 year ago

Bug got logged and scheduled for immediate followup by @JeroenVerstraelen

jdries commented 1 year ago

@jaapel @zbenta the fix for the issue with storing geotiff on s3 should be in the latest image. (The feature request above is still wip.)

jdries commented 1 year ago

@zbenta the latest release allows you to specify the catalog type, this should help with configuring your own catalog: https://github.com/Open-EO/openeo-geopyspark-driver/issues/383

cesarpferreira commented 1 year ago

Hi @jdries we tried to specify the catalog type like this:

"_vito": {
          "data_source": {
            "type": "file-s2",
            "opensearch_collection_id": "S2",
            "opensearch_endpoint": "https://openeo.a.incd.pt/resto",
            "provider:backend": "incd",
            "catalog_type": "stac"
          }
        }

We tried with catalog type stac and stacs3 and we got the following error in both cases:

{"message": "batch_job.py main: fail 2023-04-12 09:01:23.796804, elapsed 0:00:19.756659", "levelname": "INFO", "name": "openeogeotrellis.deploy.batch_job", "created": 1681290083.7968633, "filename": "util.py", "lineno": 368, "process": 75, "job_id": "j-09f9c0a78e364ebf9013894ba1cb7c32", "user_id": "cd699ad346138df0ae05cd580df3c01c2c744b5714d71f6cac72a94b6a55399f@egi.eu"}
OpenEO batch job failed: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-408c72af08294124ac1ef63016bd2e4a_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  
Traceback (most recent call last):
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 877, in <module>
    main(sys.argv)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 669, in main
    run_driver()
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 640, in run_driver
    run_job(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 53, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 711, in run_job
    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 339, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 359, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1568, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1568, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 371, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 359, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1663, in apply_process
    return process_function(args=args, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 595, in load_collection
    return env.backend_implementation.catalog.load_collection(collection_id, load_params=load_params, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo/util.py", line 382, in wrapper
    return f(*args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 115, in load_collection
    return self._load_collection_cached(collection_id, load_params, WhiteListEvalEnv(env,WHITELIST))
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 585, in _load_collection_cached
    pyramid = file_s2_pyramid()
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 304, in file_s2_pyramid
    return file_pyramid(pyramid_factory)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 360, in file_pyramid
    return create_pyramid(factory)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 324, in create_pyramid
    return factory.datacube_seq(
  File "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1231.datacube_seq.
: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-408c72af08294124ac1ef63016bd2e4a_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  
    at org.openeo.opensearch.OpenSearchClient.execute(OpenSearchClient.scala:125)
    at org.openeo.opensearch.backends.OscarsClient.$anonfun$getProductsFromPage$5(OscarsClient.scala:90)
    at org.openeo.opensearch.OpenSearchClient.attempt$1(OpenSearchClient.scala:162)
    at org.openeo.opensearch.OpenSearchClient.withRetries(OpenSearchClient.scala:171)
    at org.openeo.opensearch.backends.OscarsClient.getProductsFromPage(OscarsClient.scala:90)
    at org.openeo.opensearch.backends.OscarsClient.from$1(OscarsClient.scala:47)
    at org.openeo.opensearch.backends.OscarsClient.getProducts(OscarsClient.scala:51)
    at org.openeo.opensearch.OpenSearchClient.getProducts(OpenSearchClient.scala:78)
    at org.openeo.geotrellis.layers.FileLayerProvider.loadRasterSourceRDD(FileLayerProvider.scala:961)
    at org.openeo.geotrellis.layers.FileLayerProvider.readKeysToRasterSources(FileLayerProvider.scala:609)
    at org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:788)
    at org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:111)
    at org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:84)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:829)

{"message": "OpenEO batch job failed: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-408c72af08294124ac1ef63016bd2e4a_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  ", "levelname": "ERROR", "name": "openeo-user-log", "created": 1681290083.7995038, "filename": "batch_job.py", "lineno": 880, "process": 75, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 877, in <module>\n    main(sys.argv)\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 669, in main\n    run_driver()\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 640, in run_driver\n    run_job(\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py\", line 53, in memory_logging_wrapper\n    return function(*args, **kwargs)\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py\", line 711, in run_job\n    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 339, in evaluate\n    result = convert_node(result_node, env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 359, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 1568, in apply_process\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 1568, in <dictcomp>\n    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 371, in convert_node\n    return convert_node(processGraph['node'], env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 359, in convert_node\n    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 1663, in apply_process\n    return process_function(args=args, env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py\", line 595, in load_collection\n    return env.backend_implementation.catalog.load_collection(collection_id, load_params=load_params, env=env)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo/util.py\", line 382, in wrapper\n    return f(*args, **kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 115, in load_collection\n    return self._load_collection_cached(collection_id, load_params, WhiteListEvalEnv(env,WHITELIST))\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 585, in _load_collection_cached\n    pyramid = file_s2_pyramid()\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 304, in file_s2_pyramid\n    return file_pyramid(pyramid_factory)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 360, in file_pyramid\n    return create_pyramid(factory)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 324, in create_pyramid\n    return factory.datacube_seq(\n  File \"/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py\", line 1321, in __call__\n    return_value = get_return_value(\n  File \"/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py\", line 326, in get_return_value\n    raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while calling o1231.datacube_seq.\n: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-408c72af08294124ac1ef63016bd2e4a_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  \n\tat org.openeo.opensearch.OpenSearchClient.execute(OpenSearchClient.scala:125)\n\tat org.openeo.opensearch.backends.OscarsClient.$anonfun$getProductsFromPage$5(OscarsClient.scala:90)\n\tat org.openeo.opensearch.OpenSearchClient.attempt$1(OpenSearchClient.scala:162)\n\tat org.openeo.opensearch.OpenSearchClient.withRetries(OpenSearchClient.scala:171)\n\tat org.openeo.opensearch.backends.OscarsClient.getProductsFromPage(OscarsClient.scala:90)\n\tat org.openeo.opensearch.backends.OscarsClient.from$1(OscarsClient.scala:47)\n\tat org.openeo.opensearch.backends.OscarsClient.getProducts(OscarsClient.scala:51)\n\tat org.openeo.opensearch.OpenSearchClient.getProducts(OpenSearchClient.scala:78)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.loadRasterSourceRDD(FileLayerProvider.scala:961)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.readKeysToRasterSources(FileLayerProvider.scala:609)\n\tat org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:788)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:111)\n\tat org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:84)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n", "job_id": "j-09f9c0a78e364ebf9013894ba1cb7c32", "user_id": "cd699ad346138df0ae05cd580df3c01c2c744b5714d71f6cac72a94b6a55399f@egi.eu"}
Traceback (most recent call last):
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 877, in <module>
    main(sys.argv)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 669, in main
    run_driver()
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 640, in run_driver
    run_job(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 53, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 711, in run_job
    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 339, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 359, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1568, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1568, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 371, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 359, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1663, in apply_process
    return process_function(args=args, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 595, in load_collection
    return env.backend_implementation.catalog.load_collection(collection_id, load_params=load_params, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo/util.py", line 382, in wrapper
    return f(*args, **kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 115, in load_collection
    return self._load_collection_cached(collection_id, load_params, WhiteListEvalEnv(env,WHITELIST))
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 585, in _load_collection_cached
    pyramid = file_s2_pyramid()
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 304, in file_s2_pyramid
    return file_pyramid(pyramid_factory)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 360, in file_pyramid
    return create_pyramid(factory)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 324, in create_pyramid
    return factory.datacube_seq(
  File "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o1231.datacube_seq.
: java.io.IOException: Exception while evaluating catalog request https://openeo.a.incd.pt/resto/products?collection=S2&bbox=-6.0504822%2C37.3588379%2C-5.9236904%2C37.4240919&sortKeys=title&startIndex=1&accessedFrom=MEP&clientId=c-408c72af08294124ac1ef63016bd2e4a_0&start=2020-07-01T00%3A00%3A00Z&end=2020-12-31T23%3A59%3A59.999999999Z:  
    at org.openeo.opensearch.OpenSearchClient.execute(OpenSearchClient.scala:125)
    at org.openeo.opensearch.backends.OscarsClient.$anonfun$getProductsFromPage$5(OscarsClient.scala:90)
    at org.openeo.opensearch.OpenSearchClient.attempt$1(OpenSearchClient.scala:162)
    at org.openeo.opensearch.OpenSearchClient.withRetries(OpenSearchClient.scala:171)
    at org.openeo.opensearch.backends.OscarsClient.getProductsFromPage(OscarsClient.scala:90)
    at org.openeo.opensearch.backends.OscarsClient.from$1(OscarsClient.scala:47)
    at org.openeo.opensearch.backends.OscarsClient.getProducts(OscarsClient.scala:51)
    at org.openeo.opensearch.OpenSearchClient.getProducts(OpenSearchClient.scala:78)
    at org.openeo.geotrellis.layers.FileLayerProvider.loadRasterSourceRDD(FileLayerProvider.scala:961)
    at org.openeo.geotrellis.layers.FileLayerProvider.readKeysToRasterSources(FileLayerProvider.scala:609)
    at org.openeo.geotrellis.layers.FileLayerProvider.readMultibandTileLayer(FileLayerProvider.scala:788)
    at org.openeo.geotrellis.file.PyramidFactory.datacube(PyramidFactory.scala:111)
    at org.openeo.geotrellis.file.PyramidFactory.datacube_seq(PyramidFactory.scala:84)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:829)

Any ideas of what could be causing this error? We noticed that in the URL there is a /products but our endpoint does not have that method.

JeroenVerstraelen commented 1 year ago

Hi @cesarpferreira, judging from the stacktraces you are currently using an older version of the org.openeo.opensearch.OpenSearchClient. It looks like it was not yet present in the last image, that has now been updated. Could you try downloading the latest image again? (It should currently be version openeo-geotrellis-kube:20230412-405)

zbenta commented 1 year ago

@JeroenVerstraelen We tried using the latest image but the problem persists.

We believe it is the way that the url for the collection that is created wrongfully. The image returns an url as follows:

https://openeo.a.incd.pt/resto/search?collections=%5B%22S2%22%5D&limit=100&bbox=%5B-8.077795307878784%2C35.91205672653183%2C-3.964267609321807%2C38.94264631694279%5D&page=1&datetime=2020-12-18T00%3A00%3A00Z%2F2020-12-19T23%3A59%3A59.999999999Z

But our endpoint only understands the url as follows:

https://openeo.a.incd.pt/resto/search?collections=S2&limit=100&bbox=-8.077795307878784%2C35.91205672653183%2C-3.964267609321807%2C38.94264631694279&page=1&datetime=2020-12-18T00%3A00%3A00Z%2F2020-12-19T23%3A59%3A59.999999999Z

The collection name and the bounding box arguments don't need to be within [].

JeroenVerstraelen commented 1 year ago

@zbenta I am testing to see if I am able to reproduce the issue locally, could I also get the endpoint on which you deployed the image?

zbenta commented 1 year ago

@zbenta I am testing to see if I am able to reproduce the issue locally, could I also get the endpoint on which you deployed the image?

Sure thing: https://openeo.a.incd.pt/openeo/1.1.0/

JeroenVerstraelen commented 1 year ago

The issue should be fixed in the latest image (version 20230418-1335). Let me know if there are still some issues.

cesarpferreira commented 1 year ago

Hi @JeroenVerstraelen we updated to the latest image and we tried to run a job. The pods for the job are finished successfully and in the logs we do not find any issues but the job in the jupyter notebook is stuck with the message that job was created.

job.start_and_wait()

0:00:00 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': send 'start'
0:00:12 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:00:17 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:00:23 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:00:31 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:00:41 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:00:54 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:01:09 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:01:28 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:01:52 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:02:23 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:03:00 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:03:47 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:04:45 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:05:46 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:06:46 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:07:46 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:08:46 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:09:47 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:10:47 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:11:47 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:12:47 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:13:47 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:14:48 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:15:49 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:16:49 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)
0:17:49 Job 'j-2b3fca6c8ead40f4a0e14a22ede5dadc': created (progress N/A)

image

soxofaan commented 1 year ago

@cesarpferreira FYI this behavior looks like there is no job tracker running (e.g. as background cron job)

cesarpferreira commented 1 year ago

@soxofaan thanks for the heads up, we removed it previously when the image was not working with our RESTO catalog the job tracker was failing.

We tried to run a job again and both job pods and job tracker succeed but the job status only updates when it is finished.

0:00:00 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': send 'start'
0:00:11 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:00:16 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:00:23 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:00:31 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:00:41 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:00:53 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:01:08 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:01:28 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:01:52 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:02:22 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:02:59 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A)
0:03:46 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': finished (progress N/A)

When we tried to download the job results this error with the S3 URL appeared again:

OpenEoApiError: [500] Internal: Server error: ParamValidationError('Parameter validation failed:\nInvalid bucket name "OpenEO-datas3:": Bucket name must match the regex "^[a-zA-Z0-9.\\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\\-]{1,63}$"') (ref: r-77d5e5eaf40e47b58c449c5405366ea5)

The job tracker pod has the following message in the logs.

{"message": "Creating ElasticJobRegistry with backend_id='creodias-unknown' and api_url='https://jobregistry.openeo.vito.be'", "levelname": "INFO", "name": "openeo_driver.jobregistry.elastic", "created": 1681904440.560211, "filename": "jobregistry.py", "lineno": 226, "process": 1}
WARNING:openeo_driver.jobregistry.elastic:In context 'get_elastic_job_registry': caught FileNotFoundError(2, 'No such file or directory')
Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/util/logging.py", line 327, in just_log_exceptions
    yield
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py", line 1077, in get_elastic_job_registry
    ejr_creds = vault.get_elastic_job_registry_credentials()
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/vault.py", line 97, in get_elastic_job_registry_credentials
    client = self._client(token=vault_token or self.login_kerberos())
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/vault.py", line 81, in login_kerberos
    vault_token = subprocess.check_output(cmd, text=True, stderr=PIPE)
  File "/usr/lib64/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib64/python3.8/subprocess.py", line 493, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib64/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib64/python3.8/subprocess.py", line 1706, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'vault'
{"message": "In context 'get_elastic_job_registry': caught FileNotFoundError(2, 'No such file or directory')", "levelname": "WARNING", "name": "openeo_driver.jobregistry.elastic", "created": 1681904440.5791383, "filename": "logging.py", "lineno": 330, "process": 1, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/util/logging.py\", line 327, in just_log_exceptions\n    yield\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 1077, in get_elastic_job_registry\n    ejr_creds = vault.get_elastic_job_registry_credentials()\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/vault.py\", line 97, in get_elastic_job_registry_credentials\n    client = self._client(token=vault_token or self.login_kerberos())\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/vault.py\", line 81, in login_kerberos\n    vault_token = subprocess.check_output(cmd, text=True, stderr=PIPE)\n  File \"/usr/lib64/python3.8/subprocess.py\", line 415, in check_output\n    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n  File \"/usr/lib64/python3.8/subprocess.py\", line 493, in run\n    with Popen(*popenargs, **kwargs) as process:\n  File \"/usr/lib64/python3.8/subprocess.py\", line 858, in __init__\n    self._execute_child(args, executable, preexec_fn, close_fds,\n  File \"/usr/lib64/python3.8/subprocess.py\", line 1706, in _execute_child\n    raise child_exception_type(errno_num, err_msg, err_filename)\nFileNotFoundError: [Errno 2] No such file or directory: 'vault'"}
INFO:openeogeotrellis.job_registry:get_running_jobs: start 2023-04-19 11:40:40.677069

It looks like there are some hard coded values for VITO in the settings.

{"message": "ConfigParams(): {'async_task_handler_environment': None,\n 'async_tasks_kafka_bootstrap_servers': 'epod-master1.vgt.vito.be:6668,epod-master2.vgt.vito.be:6668,epod-master3.vgt.vito.be:6668',\n 'batch_job_output_root': PosixPath('/batch_jobs'),\n 'batch_jobs_zookeeper_root_path': '/openeo/jobs',\n 'cache_shub_batch_results': False,\n 'default_opensearch_endpoint': 'https://services.terrascope.be/catalogue',\n 'ejr_api': 'https://jobregistry.openeo.vito.be',\n 'ejr_backend_id': 'creodias-unknown',\n 'ejr_credentials_vault_path': 'TAP/big_data_services/openeo/openeo-job-registry-elastic-api',\n 'etl_api': 'https://etl.terrascope.be',\n 'etl_api_oidc_issuer': 'https://sso.terrascope.be/auth/realms/terrascope',\n 'is_ci_context': False,\n 'is_kube_deploy': 'true',\n 'layer_catalog_metadata_files': ['layercatalog.json'],\n 'openeo_env': 'unknown',\n 'opensearch_enrich': True,\n 'persistent_worker_count': 0,\n 'persistent_worker_dir': PosixPath('/data/projects/OpenEO/persistent_workers'),\n 's1backscatter_elev_geoid': None,\n 's3_bucket_name': 'OpenEO-data',\n 'vault_addr': 'https://vault.vgt.vito.be',\n 'yarn_rest_api_base_url': 'https://epod-master1.vgt.vito.be:8090',\n 'zookeepernodes': ['zookeeper-cscale.zookeeper.svc.cluster.local:2181']}", "levelname": "INFO", "name": "openeogeotrellis.job_tracker", "created": 1681904979.4345596, "filename": "job_tracker.py", "lineno": 403, "process": 1}
INFO:openeogeotrellis.job_tracker:ConfigParams(): {'async_task_handler_environment': None,
 'async_tasks_kafka_bootstrap_servers': 'epod-master1.vgt.vito.be:6668,epod-master2.vgt.vito.be:6668,epod-master3.vgt.vito.be:6668',
 'batch_job_output_root': PosixPath('/batch_jobs'),
 'batch_jobs_zookeeper_root_path': '/openeo/jobs',
 'cache_shub_batch_results': False,
 'default_opensearch_endpoint': 'https://services.terrascope.be/catalogue',
 'ejr_api': 'https://jobregistry.openeo.vito.be',
 'ejr_backend_id': 'creodias-unknown',
 'ejr_credentials_vault_path': 'TAP/big_data_services/openeo/openeo-job-registry-elastic-api',
 'etl_api': 'https://etl.terrascope.be',
 'etl_api_oidc_issuer': 'https://sso.terrascope.be/auth/realms/terrascope',
 'is_ci_context': False,
 'is_kube_deploy': 'true',
 'layer_catalog_metadata_files': ['layercatalog.json'],
 'openeo_env': 'unknown',
 'opensearch_enrich': True,
 'persistent_worker_count': 0,
 'persistent_worker_dir': PosixPath('/data/projects/OpenEO/persistent_workers'),
 's1backscatter_elev_geoid': None,
 's3_bucket_name': 'OpenEO-data',
 'vault_addr': 'https://vault.vgt.vito.be',
 'yarn_rest_api_base_url': 'https://epod-master1.vgt.vito.be:8090',
 'zookeepernodes': ['zookeeper-cscale.zookeeper.svc.cluster.local:2181']}
soxofaan commented 1 year ago

Invalid bucket name "OpenEO-datas3:

This sound like something we've seen before. I'm checking with my colleagues.

WARNING:openeo_driver.jobregistry.elastic:In context 'get_elastic_job_registry': caught FileNotFoundError

that is just a warning that you can ignore for now, it should not block the normal flow.

It looks like there are some hard coded values for VITO in the settings.

That is indeed a known issue we are eliminating step by step

We tried to run a job again and both job pods and job tracker succeed but the job status only updates when it is finished. 0:02:59 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': created (progress N/A) 0:03:46 Job 'j-a27d3c0acd7b4e108966239c7ec795ea': finished (progress N/A)

There is almost 2 minutes between these two status polls, so technically it's possible that the job reached status "queued" and "running" in between. Or do you suspect that these statuses are never observed by the client side polling loop?

jdries commented 1 year ago

@cesarpferreira I made another fix for the issue of downloading a multitemporal cube that is saved to geotiff. Can you redeploy and try again? I also understood there was something else you were waiting for, before work at INFN could continue, can you recap what that was?

cesarpferreira commented 1 year ago

Hi @jdries , we redeployed and TLDR: With jupyter notebook job state is not updating correctly, it ends up failing and we are unable to retrive logs for the job. In editor.openeo.org everything seems to work fine.

We tried again to run a job with the jupyter notebook:

image

image

image

results.download_files(Path("./incd_Result6"))

---------------------------------------------------------------------------
OpenEoApiError                            Traceback (most recent call last)
/tmp/ipykernel_78/3408507428.py in <module>
----> 1 results.download_files(Path("./incd_Result6"))

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in download_files(self, target, include_stac_metadata)
    412         ensure_dir(target)
    413 
--> 414         downloaded = [a.download(target) for a in self.get_assets()]
    415 
    416         if include_stac_metadata:

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in get_assets(self)
    347         """
    348         # TODO: add arguments to filter on metadata, e.g. to only get assets of type "image/tiff"
--> 349         metadata = self.get_metadata()
    350         if "assets" in metadata:
    351             # API 1.0 style: dictionary mapping filenames to metadata dict (with at least a "href" field)

/opt/conda/lib/python3.9/site-packages/openeo/rest/job.py in get_metadata(self, force)
    337         """Get batch job results metadata (parsed JSON)"""
    338         if self._results is None or force:
--> 339             self._results = self._job.connection.get(self._results_url, expected_status=200).json()
    340         return self._results
    341 

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in get(self, path, stream, auth, **kwargs)
    161         :return: response: Response
    162         """
--> 163         return self.request("get", path=path, stream=stream, auth=auth, **kwargs)
    164 
    165     def post(self, path, json: dict = None, **kwargs) -> Response:

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    596         try:
    597             # Initial request attempt
--> 598             return _request()
    599         except OpenEoApiError as api_exc:
    600             if api_exc.http_status_code == 403 and api_exc.code == "TokenInvalid":

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in _request()
    589         # Do request, but with retry when access token has expired and refresh token is available.
    590         def _request():
--> 591             return super(Connection, self).request(
    592                 method=method, path=path, headers=headers, auth=auth,
    593                 check_error=check_error, expected_status=expected_status, **kwargs,

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in request(self, method, path, headers, auth, check_error, expected_status, **kwargs)
    119         expected_status = ensure_list(expected_status) if expected_status else []
    120         if check_error and status >= 400 and status not in expected_status:
--> 121             self._raise_api_error(resp)
    122         if expected_status and status not in expected_status:
    123             raise OpenEoRestError("Got status code {s!r} for `{m} {p}` (expected {e!r})".format(

/opt/conda/lib/python3.9/site-packages/openeo/rest/connection.py in _raise_api_error(self, response)
    150             else:
    151                 exception = OpenEoApiError(http_status_code=status_code, message=text)
--> 152         raise exception
    153 
    154     def get(self, path, stream=False, auth: AuthBase = None, **kwargs) -> Response:

OpenEoApiError: [400] JobNotFinished: Batch job has not finished computing the results yet. Please try again later or contact our support. (ref: r-4159ada686a74be9b5760c5278ea2c22)

We also tried editor.openeo.org:

image

image

image

image

Any suggestions?

jdries commented 1 year ago

Hi @cesarpferreira ,

  1. so download issue is gone
  2. logs retrieval: it is trying to contact vito's elasticsearch cluster, that for sure won't work, can you create separate issue to get logs to work?
  3. launching job via editor or notebook does exactly the same thing. Can you maybe retry notebook or something? Or do you set custom options in the notebook?