c-scale-community / use-case-aquamonitor

Apache License 2.0
2 stars 1 forks source link

upgrade openEO on INCD #26

Closed jdries closed 1 year ago

jdries commented 2 years ago

Upgrade openEO to latest version. @zbenta We updated our documentation to reflect latest upgrades to software stack: https://github.com/Open-EO/openeo-geotrellis-kubernetes/commit/a8147962d49555d366b00f64f7b27ff62c5f712e

Can you try again based on those instructions, and report issues here? @tcassaert is available to follow up more quickly (than I can)!

zbenta commented 2 years ago

Hi @jdries, just tried to follow your tutorial and found that I'm unable to setup the spark-operator in the current version(v1.16.15) of kubernetes we are using in the cluster. I've tried deploying it on another cluster with a newer version(v1.21.6) and it worked, now, we are having issues with the spark-jobs. When we try to deploy it as follows:

[root@k8s-test-cluster-k8s-master-nf-1 ~]# helm install myspark vito/sparkapplication --version 0.5.0 --create-namespace --namespace spark-jobs -f values.yaml
W0607 11:00:37.367991   11917 warnings.go:70] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0607 11:00:37.369756   11917 warnings.go:70] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0607 11:00:38.484370   11917 warnings.go:70] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0607 11:00:38.484556   11917 warnings.go:70] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
NAME: myspark
LAST DEPLOYED: Tue Jun  7 11:00:36 2022
NAMESPACE: spark-jobs
STATUS: deployed
REVISION: 1
TEST SUITE: None

It states that it is ok, but if we consult the logs from the pod that is started we see the following:

spark-kubernetes-driver
++ id -u
spark-kubernetes-driver
+ myuid=18585
spark-kubernetes-driver
++ id -g
spark-kubernetes-driver
+ mygid=18585
spark-kubernetes-driver
+ set +e
spark-kubernetes-driver
++ getent passwd 18585
spark-kubernetes-driver
+ uidentry=spark:x:18585:18585::/opt/spark/work-dir:/bin/bash
spark-kubernetes-driver
+ set -e
spark-kubernetes-driver
+ '[' -z spark:x:18585:18585::/opt/spark/work-dir:/bin/bash ']'
spark-kubernetes-driver
+ SPARK_CLASSPATH=':/usr/local/spark/jars/*'
spark-kubernetes-driver
+ env
spark-kubernetes-driver
+ grep SPARK_JAVA_OPT_
spark-kubernetes-driver
+ sort -t_ -k4 -n
spark-kubernetes-driver
+ sed 's/[^=]*=\(.*\)/\1/g'
spark-kubernetes-driver
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
spark-kubernetes-driver
+ '[' -n '' ']'
spark-kubernetes-driver
+ '[' '' == 3 ']'
spark-kubernetes-driver
+ '[' -n /usr/hdp/current/hadoop-client ']'
spark-kubernetes-driver
+ '[' -z '' ']'
spark-kubernetes-driver
++ /usr/hdp/current/hadoop-client/bin/hadoop classpath
spark-kubernetes-driver
/opt/entrypoint.sh: line 57: /usr/hdp/current/hadoop-client/bin/hadoop: No such file or directory
spark-kubernetes-driver
+ export SPARK_DIST_CLASSPATH=
spark-kubernetes-driver
+ SPARK_DIST_CLASSPATH=
spark-kubernetes-driver
+ '[' -z x ']'
spark-kubernetes-driver
+ SPARK_CLASSPATH='/etc/hadoop/conf::/usr/local/spark/jars/*'
spark-kubernetes-driver
+ echo 'My start command is driver'
spark-kubernetes-driver
My start command is driver
spark-kubernetes-driver
+ case "$1" in
spark-kubernetes-driver
+ shift 1
spark-kubernetes-driver
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
spark-kubernetes-driver
+ exec /usr/bin/tini -s -- /usr/local/spark/bin/spark-submit --conf spark.driver.bindAddress=10.233.87.198 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py
spark-kubernetes-driver
WARNING: An illegal reflective access operation has occurred
spark-kubernetes-driver
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.2.0/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
spark-kubernetes-driver
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
spark-kubernetes-driver
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
spark-kubernetes-driver
WARNING: All illegal access operations will be denied in a future release
spark-kubernetes-driver
22/06/07 11:33:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
spark-kubernetes-driver
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
spark-kubernetes-driver
22/06/07 11:33:21 WARN DependencyUtils: Local jar /opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar does not exist, skipping.
spark-kubernetes-driver
22/06/07 11:33:21 WARN DependencyUtils: Local jar /opt/geotrellis-backend-assembly-0.4.6-openeo.jar does not exist, skipping.
spark-kubernetes-driver
python3: can't open file '/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py': [Errno 2] No such file or directory
spark-kubernetes-driver
22/06/07 11:33:21 INFO ShutdownHookManager: Shutdown hook called
spark-kubernetes-driver
22/06/07 11:33:21 INFO ShutdownHookManager: Deleting directory /tmp/spark-9ebcd8be-dc28-4cbd-9c56-e68a90bd6050

It looks like the image is missing a few dependecies.

In the meanwhile we will try to upgrade our kubernetes cluster and try to redeploy the service.

tcassaert commented 2 years ago

@zbenta, the example values.yaml doesn't necessarily shows the current version of the jar dependencies in the latest image. You better run the image outside of the cluster and check what versions are in /opt.

Right now, it should be:

jarDependencies:
  - 'local:///opt/geotrellis-extensions-2.3.0_2.12-SNAPSHOT.jar'
  - 'local:///opt/geotrellis-backend-assembly-0.4.6-openeo_2.12.jar'
zbenta commented 2 years ago

Taking a look at the image on my local machine i can see the folowinng jar files:

geotrellis-backend-assembly-0.4.6-openeo_2.12.jar
geotrellis-extensions-2.3.0_2.12-SNAPSHOT.jar

I've altered the valus.yaml to reflect the above mentioned versions Now, taking a look at the logs from the kubenetes pod, I can see:

Show only filtered
Display timestamp
spark-kubernetes-driver
++ id -u
spark-kubernetes-driver
+ myuid=18585
spark-kubernetes-driver
++ id -g
spark-kubernetes-driver
+ mygid=18585
spark-kubernetes-driver
+ set +e
spark-kubernetes-driver
++ getent passwd 18585
spark-kubernetes-driver
+ uidentry=spark:x:18585:18585::/opt/spark/work-dir:/bin/bash
spark-kubernetes-driver
+ set -e
spark-kubernetes-driver
+ '[' -z spark:x:18585:18585::/opt/spark/work-dir:/bin/bash ']'
spark-kubernetes-driver
+ SPARK_CLASSPATH=':/usr/local/spark/jars/*'
spark-kubernetes-driver
+ env
spark-kubernetes-driver
+ grep SPARK_JAVA_OPT_
spark-kubernetes-driver
+ sort -t_ -k4 -n
spark-kubernetes-driver
+ sed 's/[^=]*=\(.*\)/\1/g'
spark-kubernetes-driver
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
spark-kubernetes-driver
+ '[' -n '' ']'
spark-kubernetes-driver
+ '[' '' == 3 ']'
spark-kubernetes-driver
+ '[' -n /usr/hdp/current/hadoop-client ']'
spark-kubernetes-driver
+ '[' -z '' ']'
spark-kubernetes-driver
++ /usr/hdp/current/hadoop-client/bin/hadoop classpath
spark-kubernetes-driver
/opt/entrypoint.sh: line 57: /usr/hdp/current/hadoop-client/bin/hadoop: No such file or directory
spark-kubernetes-driver
+ export SPARK_DIST_CLASSPATH=
spark-kubernetes-driver
+ SPARK_DIST_CLASSPATH=
spark-kubernetes-driver
My start command is driver
spark-kubernetes-driver
+ '[' -z x ']'
spark-kubernetes-driver
+ SPARK_CLASSPATH='/etc/hadoop/conf::/usr/local/spark/jars/*'
spark-kubernetes-driver
+ echo 'My start command is driver'
spark-kubernetes-driver
+ case "$1" in
spark-kubernetes-driver
+ shift 1
spark-kubernetes-driver
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
spark-kubernetes-driver
+ exec /usr/bin/tini -s -- /usr/local/spark/bin/spark-submit --conf spark.driver.bindAddress=10.233.87.205 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py
spark-kubernetes-driver
WARNING: An illegal reflective access operation has occurred
spark-kubernetes-driver
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.2.0/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
spark-kubernetes-driver
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
spark-kubernetes-driver
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
spark-kubernetes-driver
WARNING: All illegal access operations will be denied in a future release
spark-kubernetes-driver
22/06/07 13:10:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
spark-kubernetes-driver
python3: can't open file '/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py': [Errno 2] No such file or directory
spark-kubernetes-driver
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
spark-kubernetes-driver
log4j:WARN Please initialize the log4j system properly.
spark-kubernetes-driver
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Taking a look at the image in my local machine i can see the /usr/local/lib/python3.8/site-packages(the python version is 3.8 and not 3.7 as per the example values.yaml file) but no dist-packages folder within that path.

My values.yaml file is as follows, am I missing anything?

---
image: "vito-docker.artifactory.vgt.vito.be/openeo-geotrellis"
imageVersion: "latest"
jmxExporterJar: "/opt/jmx_prometheus_javaagent-0.13.0.jar"
mainApplicationFile: "local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py"
serviceAccount: "openeo"
volumes:
  - name: "eodata"
    hostPath:
      path: "/eodata"
      type: "DirectoryOrCreate"
volumeMounts:
  - name: "eodata"
    mountPath: "/eodata"
executor:
  memory: "512m"
  cpu: 1
  envVars:
    OPENEO_CATALOG_FILES: "/opt/layercatalog.json"
    OPENEO_S1BACKSCATTER_ELEV_GEOID: "/opt/openeo-vito-aux-data/egm96.grd"
    OTB_HOME: "/opt/orfeo-toolbox"
    OTB_APPLICATION_PATH: "/opt/orfeo-toolbox/lib/otb/applications"
    KUBE: "true"
    GDAL_NUM_THREADS: "2"
  javaOptions: "-Dlog4j.configuration=/opt/log4j.properties -Dscala.concurrent.context.numThreads=4 -Dscala.concurrent.context.maxThreads=4"
driver:
  memory: "512m"
  cpu: 1
  envVars:
    KUBE: "true"
    KUBE_OPENEO_API_PORT: "50001"
    DRIVER_IMPLEMENTATION_PACKAGE: "openeogeotrellis"
    OPENEO_CATALOG_FILES: "/opt/layercatalog.json"
    OPENEO_S1BACKSCATTER_ELEV_GEOID: "/opt/openeo-vito-aux-data/egm96.grd"
    OTB_HOME: "/opt/orfeo-toolbox"
    OTB_APPLICATION_PATH: "/opt/orfeo-toolbox/lib/otb/applications"
  javaOptions: "-Dlog4j.configuration=/opt/log4j.properties -Dscala.concurrent.context.numThreads=6 -Dpixels.treshold=1000000"
sparkConf:
  "spark.executorEnv.DRIVER_IMPLEMENTATION_PACKAGE": "openeogeotrellis"
  "spark.extraListeners": "org.openeo.sparklisteners.CancelRunawayJobListener"
  "spark.appMasterEnv.DRIVER_IMPLEMENTATION_PACKAGE": "openeogeotrellis"
  "spark.executorEnv.GDAL_NUM_THREADS": "2"
  "spark.executorEnv.GDAL_DISABLE_READDIR_ON_OPEN": "EMPTY_DIR"
jarDependencies:
  - 'local:///opt/geotrellis-extensions-2.3.0_2.12-SNAPSHOT.jar'
  - 'local:///opt/geotrellis-backend-assembly-0.4.6-openeo_2.12.jar'
fileDependencies:
  - 'local:///opt/layercatalog.json'
service:
  enabled: true
  port: 50001
---
image: "vito-docker.artifactory.vgt.vito.be/openeo-geotrellis"
imageVersion: "latest"
jmxExporterJar: "/opt/jmx_prometheus_javaagent-0.13.0.jar"
mainApplicationFile: "local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py"
serviceAccount: "openeo"
volumes:
  - name: "eodata"
    hostPath:
      path: "/eodata"
      type: "DirectoryOrCreate"
volumeMounts:
  - name: "eodata"
    mountPath: "/eodata"
executor:
  memory: "512m"
  cpu: 1
  envVars:
    OPENEO_CATALOG_FILES: "/opt/layercatalog.json"
    OPENEO_S1BACKSCATTER_ELEV_GEOID: "/opt/openeo-vito-aux-data/egm96.grd"
    OTB_HOME: "/opt/orfeo-toolbox"
    OTB_APPLICATION_PATH: "/opt/orfeo-toolbox/lib/otb/applications"
    KUBE: "true"
    GDAL_NUM_THREADS: "2"
  javaOptions: "-Dlog4j.configuration=/opt/log4j.properties -Dscala.concurrent.context.numThreads=4 -Dscala.concurrent.context.maxThreads=4"
driver:
  memory: "512m"
  cpu: 1
  envVars:
    KUBE: "true"
    KUBE_OPENEO_API_PORT: "50001"
    DRIVER_IMPLEMENTATION_PACKAGE: "openeogeotrellis"
    OPENEO_CATALOG_FILES: "/opt/layercatalog.json"
    OPENEO_S1BACKSCATTER_ELEV_GEOID: "/opt/openeo-vito-aux-data/egm96.grd"
    OTB_HOME: "/opt/orfeo-toolbox"
    OTB_APPLICATION_PATH: "/opt/orfeo-toolbox/lib/otb/applications"
  javaOptions: "-Dlog4j.configuration=/opt/log4j.properties -Dscala.concurrent.context.numThreads=6 -Dpixels.treshold=1000000"
sparkConf:
  "spark.executorEnv.DRIVER_IMPLEMENTATION_PACKAGE": "openeogeotrellis"
  "spark.extraListeners": "org.openeo.sparklisteners.CancelRunawayJobListener"
  "spark.appMasterEnv.DRIVER_IMPLEMENTATION_PACKAGE": "openeogeotrellis"
  "spark.executorEnv.GDAL_NUM_THREADS": "2"
  "spark.executorEnv.GDAL_DISABLE_READDIR_ON_OPEN": "EMPTY_DIR"
jarDependencies:
  - 'local:///opt/geotrellis-extensions-2.3.0_2.12-SNAPSHOT.jar'
  - 'local:///opt/geotrellis-backend-assembly-0.4.6-openeo_2.12.jar'
fileDependencies:
  - 'local:///opt/layercatalog.json'
service:
  enabled: true
  port: 50001
ingress:
  annotations:
    kubernetes.io/ingress.class: traefik
  enabled: true
  hosts:
  - host: openeo.example.com
    paths:
      - '/'
rbac:
  create: true
  serviceAccountName: openeo
spark_ui:
  port: 4040
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: traefik
    hosts:
      - host: spark-ui.openeo.example.com
        paths:
          - '/'
tcassaert commented 2 years ago

I've updated the docs with the correct image name, mainApplicationFile and jars. (https://github.com/Open-EO/openeo-geotrellis-kubernetes/commit/1d97327701cac1e5dfbc3368b24ff09ad406fe10)

There's currently no automatic deployment of a latest tag, but I've tagged our most recent version as latest and pushed it to our Artifactory. I've added the automatic tagging to my to-do list, but for now you should be able to test with the image that's available.

zbenta commented 2 years ago

Thanks for the update to the values.yaml file. Now I see the following error on the myspark-driver pod logs:

...
spark-kubernetes-driver
File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/kube.py", line 10, in <module>
spark-kubernetes-driver
from openeo_driver.server import run_gunicorn, build_backend_deploy_metadata
spark-kubernetes-driver
ModuleNotFoundError: No module named 'openeo_driver'
...
tcassaert commented 2 years ago

The driver and executor envVars are missing PYTHONPATH: "$PYTHONPATH/opt/openeo/lib/python3.8/site-packages/". Adding it to the example values.yaml.

zbenta commented 2 years ago

Hi @tcassaert the python path env variable should be PYTHONPATH: "/opt/openeo/lib/python3.8/site-packages/"and not PYTHONPATH: "$PYTHONPATH/opt/openeo/lib/python3.8/site-packages/".

We altered it, the pod started but crashed with the following message:

22/06/08 09:36:20 INFO CancelRunawayJobListener: initialized with timeout PT15M
22/06/08 09:36:20 INFO SparkContext: Registered listener org.openeo.sparklisteners.CancelRunawayJobListener
22/06/08 09:36:50 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
{"message": "Connection dropped: socket connection error: None", "levelname": "WARNING", "name": "kazoo.client", "created": 1654681020.399042, "filename": "connection.py", "lineno": 611, "process": 48, "req_id": "no-request", "user_id": null}
{"message": "Connection dropped: socket connection error: None", "levelname": "WARNING", "name": "kazoo.client", "created": 1654681030.4076974, "filename": "connection.py", "lineno": 611, "process": 48, "req_id": "no-request", "user_id": null}
{"message": "Unhandled KazooTimeoutError exception: KazooTimeoutError('Connection time-out')", "levelname": "ERROR", "name": "openeo_driver.util.logging", "created": 1654681030.4085562, "filename": "logging.py", "lineno": 152, "process": 48, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/kube.py\", line 95, in <module>\n    main()\n  File \"/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/kube.py\", line 53, in main\n    app = build_app(backend_implementation=GeoPySparkBackendImplementation())\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 265, in __init__\n    else ZooKeeperServiceRegistry()\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/service_registry.py\", line 121, in __init__\n    with self._zk_client() as zk:\n  File \"/usr/lib64/python3.8/contextlib.py\", line 113, in __enter__\n    return next(self.gen)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/service_registry.py\", line 201, in _zk_client\n    zk.start()\n  File \"/opt/openeo/lib/python3.8/site-packages/kazoo/client.py\", line 635, in start\n    raise self.handler.timeout_exception(\"Connection time-out\")\nkazoo.handlers.threading.KazooTimeoutError: Connection time-out", "req_id": "no-request", "user_id": null}
22/06/08 09:37:10 INFO SparkUI: Stopped Spark web UI at http://myspark-435d4b8142ac7a56-driver-svc.new-spark-jobs.svc:4040
22/06/08 09:37:10 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
tcassaert commented 2 years ago

There was a : missing. It should be PYTHONPATH: "$PYTHONPATH:/opt/openeo/lib/python3.8/site-packages/". That way you don't override your PYTHONPATH, you just append to it.

Do you have a Zookeeper running? I suppose not?

zbenta commented 2 years ago

We have no Zookeeper. Do we need one? Our "animals" are well behaved, there's no need to pull out the whip ;-)

tcassaert commented 2 years ago

You'd only need one if you want to keep track of batch jobs. But it looks like it requires one anyhow, even if you don't set the ZOOKEEPERNODES environment variable. I'll check what's going on and come back to you ASAP.

tcassaert commented 2 years ago

@zbenta if you'd add TRAVIS: 1 to your driver envVars, it should see it like a CI-environment and skip the Zookeeper part.

zbenta commented 1 year ago

Latest version of openeo deployed in our cluster.

https://openeo.a.incd.pt/openeo/1.1.0/

Can anyone test it out? Do we need to setup any config file so that when @Jaapel tests it out he will get data from the collection(https://resto.c-scale.zcu.cz/collections/S2/) at cesnet catalog that we have already registered?

Jaapel commented 1 year ago

@zbenta, I recently gave a course on the aquamonitor notebook, so the notebook is quite streamlined now. If you want to try it out (you can even try openeo platform's notebook server), then you can change the backend url and run the notebook!

zbenta commented 1 year ago

@zbenta, I recently gave a course on the aquamonitor notebook, so the notebook is quite streamlined now. If you want to try it out (you can even try openeo platform's notebook server), then you can change the backend url and run the notebook!

Sweet, where can I find the notebook?

Jaapel commented 1 year ago

@zbenta this repository notebooks/aquamonitor.ipynb

zbenta commented 1 year ago

The notebook is fighting back : -) I've installed the utils package but it seems that it has no implementation of the get_files_from_dc method

image

Any thoughts.

Jaapel commented 1 year ago

Utils is not expected to be installed in a "package-like" manner. How are you running the notebook and where relative to the notebook is the utils.py located? If you stick to the structure of the repository without pip installing the utils package, you should be fine.

zbenta commented 1 year ago

Thanks for the tips @Jaapel I figured out that I had to donwload the utils.py and the cached_job.py files. We tried the jupyter notebook and now are facing this issue: image

Jaapel commented 1 year ago

I remember this issue, I had to set some notebook settings! https://github.com/c-scale-community/use-case-aquamonitor/blob/main/Dockerfile#L13

zbenta commented 1 year ago

I remember this issue, I had to set some notebook settings! https://github.com/c-scale-community/use-case-aquamonitor/blob/main/Dockerfile#L13 Hi @Jaapel , thanks for the tip, but the problem persists: image The endpoint is deployed but with all these errors. We have no idea if it is working properly, we believe that running the notebook would confirm that all is as it should be. Is there any other way to confirm if all is setup correctly on our endpoint?

Jaapel commented 1 year ago

I was apparently only looking at the Size limit that you got. I am not familiar with the KazooTimeoutError. @zbenta do you know what this is used for? Maybe a familiar error for you @jdries ?

jdries commented 1 year ago

Yes, this is when it cannot connect to zookeeper, but I guess we made this setup without any persistence in zookeeper right. It would be good if we can get the full stack trace, which is normally printed in the logs of the main driver pod. (You might find it if you search for this r-1a99... id)

zbenta commented 1 year ago

After another try we have some debug data as per @jdries suggestion. image

Here are some logs {"message": "10.233.102.0 - - [18/Oct/2022:12:08:20 +0000] \"GET /openeo/1.1.0/file_formats HTTP/1.1\" 200 2963 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1666094900.038365, "filename": "glogging.py", "lineno": 349, "process": 106, "req_id": "no-request", "user_id": null} {"message": "Handling POST https://openeo.a.incd.pt/openeo/1.1.0/jobs with data b'{\"title\": \"get_collection\", \"process\": {\"process_graph\": {\"loadcollection1\": {\"process_id\": \"load_collection\", \"arguments\": {\"bands\": [\"B02\", \"B03\", \"B04\"], \"id\": \"SENTINEL2_L1C_INCD\", \"spatial_extent\": {\"west\": -7.682155041704681, \"east\": -5.083888440142181, \"south\": 36.18203953636458, \"north\": 38.620982842287496, \"crs\": \"EPSG:4326\"}, \"temporal_extent\": [\"2018-01-01\", \"2021-01-01\"]}}, \"adddimension1\": {\"process_id\": \"add_dimension\", \"arguments\": {\"data\": {\"from_node\": \"loadcollection1\"}, \"label\": \"SENTINEL2_L1C_INCD\", \"name\": \"source_name\", \"type\": \"other\"}}, \"renamelabels1\": {\"process_id\": \"rename_labels\", \"arguments\": {\"data\": {\"from_node\": \"adddimension1\"}, \"dimension\": \"bands\", \"source\": [\"B02\", \"B03\", \"B04\"], \"target\": [\"swir\", \"nir\", \"green\"]}}, \"saveresult1\": {\"process_id\": \"save_result\", \"arguments\": {\"data\": {\"from_node\": \"renamelabels1\"}, \"format\": \"NetCDF\", \"options\": {}}, \"result\": true}}}}'", "levelname": "INFO", "name": "openeo_driver.views", "created": 1666094900.096858, "filename": "views.py", "lineno": 150, "process": 106, "req_id": "r-4bdaaecdc01045edac18af953835db59", "user_id": null} {"message": "<class 'openeo_driver.util.logging.FlaskUserIdLogging'> storing user id '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu' on <flask.g of 'openeo_driver.views'>", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1666094900.7306511, "filename": "logging.py", "lineno": 246, "process": 106, "req_id": "r-4bdaaecdc01045edac18af953835db59", "user_id": null} {"message": "Connection dropped: socket connection error: None", "levelname": "WARNING", "name": "kazoo.client", "created": 1666094910.7566252, "filename": "connection.py", "lineno": 611, "process": 106, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: None", "levelname": "WARNING", "name": "kazoo.client", "created": 1666094920.7674475, "filename": "connection.py", "lineno": 611, "process": 106, "req_id": "no-request", "user_id": null} {"message": "KazooTimeoutError('Connection time-out')", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1666094920.7687132, "filename": "views.py", "lineno": 258, "process": 106, "exc_info": "Traceback (most recent call last):\n File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n rv = self.dispatch_request()\n File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py\", line 88, in decorated\n return f(*args, **kwargs)\n File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 736, in create_job\n job_info = backend_implementation.batch_jobs.create_job(\n File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 890, in create_job\n with JobRegistry() as registry:\n File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/job_registry.py\", line 219, in __enter__\n self._zk.start()\n File \"/opt/openeo/lib/python3.8/site-packages/kazoo/client.py\", line 635, in start\n raise self.handler.timeout_exception(\"Connection time-out\")\nkazoo.handlers.threading.KazooTimeoutError: Connection time-out", "req_id": "r-4bdaaecdc01045edac18af953835db59", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"} {"message": "10.233.102.0 - - [18/Oct/2022:12:08:40 +0000] \"POST /openeo/1.1.0/jobs HTTP/1.1\" 500 129 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1666094920.7707026, "filename": "glogging.py", "lineno": 349, "process": 106, "req_id": "no-request", "user_id": null}

jdries commented 1 year ago

Ok, the current setup only works with synchronous calls, because there is no zookeeper. We'll probably want to fix that, it should not be too hard, as there's a helm chart for that.

Our config for that is very simple, it just needs to be deployed in the same k8s cluster:

resource "helm_release" "zookeeper" { chart = "zookeeper" create_namespace = true name = "zookeeper" namespace = "zookeeper" repository = "https://artifactory.vgt.vito.be/helm-charts" version = "5.22.2"

values = [ file("./zk/values.yaml") ] }

and values.yaml::


global: storageClass: "csi-cinder-sc-delete" replicaCount: 3 autopurge: purgeInterval: 1 persistence: storageClass: "csi-cinder-sc-delete" size: "16Gi"

zbenta commented 1 year ago

Ok, the current setup only works with synchronous calls, because there is no zookeeper. We'll probably want to fix that, it should not be too hard, as there's a helm chart for that. Our config for that is very simple, it just needs to be deployed in the same k8s cluster: resource "helm_release" "zookeeper" { chart = "zookeeper" create_namespace = true name = "zookeeper" namespace = "zookeeper" repository = "https://artifactory.vgt.vito.be/helm-charts" version = "5.22.2" values = [ file("./zk/values.yaml") ] } and values.yaml:: --- global: storageClass: "csi-cinder-sc-delete" replicaCount: 3 autopurge: purgeInterval: 1 persistence: storageClass: "csi-cinder-sc-delete" size: "16Gi"

@jdries, any tips on the values.yaml for the deployment of zookeeper? We have no experience with zookeeper at all.

jdries commented 1 year ago

It's there right, in my previous comment? Zookeeper needs very little configuration normally, after setup, we just need the host names and should be good to go. @tcassaert Did I miss anything?

tcassaert commented 1 year ago

It's a matter of configuring the correct storage class and then indeed add the ZOOKEEPERNODES environment variable to the driver and executor of the openeo deployment.

In our case, it's ZOOKEEPERNODES: "zookeeper.zookeeper.svc.cluster.local:2181"

zbenta commented 1 year ago

@tcassaert I remember that we added the TRAVIS: 1 env variable because we had no zookeeper, do we need to remove it now?

zbenta commented 1 year ago

Even after having deployed zookeeper, without any PVC(because we where having issues with the storage class)

image

We get the following error:

{"message": "Handling POST https://openeo.a.incd.pt/openeo/1.1.0/jobs with data b'{\"title\": \"get_collection\", \"process\": {\"process_graph\": {\"loadcollection1\": {\"process_id\": \"load_collection\", \"arguments\": {\"bands\": [\"B02\", \"B03\", \"B04\"], \"id\": \"SENTINEL2_L1C_INCD\", \"spatial_extent\": {\"west\": -7.682155041704681, \"east\": -5.083888440142181, \"south\": 36.18203953636458, \"north\": 38.620982842287496, \"crs\": \"EPSG:4326\"}, \"temporal_extent\": [\"2018-01-01\", \"2021-01-01\"]}}, \"adddimension1\": {\"process_id\": \"add_dimension\", \"arguments\": {\"data\": {\"from_node\": \"loadcollection1\"}, \"label\": \"SENTINEL2_L1C_INCD\", \"name\": \"source_name\", \"type\": \"other\"}}, \"renamelabels1\": {\"process_id\": \"rename_labels\", \"arguments\": {\"data\": {\"from_node\": \"adddimension1\"}, \"dimension\": \"bands\", \"source\": [\"B02\", \"B03\", \"B04\"], \"target\": [\"swir\", \"nir\", \"green\"]}}, \"saveresult1\": {\"process_id\": \"save_result\", \"arguments\": {\"data\": {\"from_node\": \"renamelabels1\"}, \"format\": \"NetCDF\", \"options\": {}}, \"result\": true}}}}'", "levelname": "INFO", "name": "openeo_driver.views", "created": 1666174592.236154, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-227c4d850290472e99d4bb3f94593908", "user_id": null} {"message": "<class 'openeo_driver.util.logging.FlaskUserIdLogging'> storing user id '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu' on <flask.g of 'openeo_driver.views'>", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1666174592.8900156, "filename": "logging.py", "lineno": 246, "process": 104, "req_id": "r-227c4d850290472e99d4bb3f94593908", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174592.8965578, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174592.97724, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174593.1666248, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174593.4045208, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174593.9148726, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174595.3103108, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174598.8789494, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174604.6097133, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Failed connecting to Zookeeper within the connection retry policy.", "levelname": "WARNING", "name": "kazoo.client", "created": 1666174607.9158096, "filename": "connection.py", "lineno": 515, "process": 104, "req_id": "no-request", "user_id": null} {"message": "KazooTimeoutError('Connection time-out')", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1666174607.9167767, "filename": "views.py", "lineno": 258, "process": 104, "exc_info": "Traceback (most recent call last):\n File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n rv = self.dispatch_request()\n File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py\", line 88, in decorated\n return f(*args, **kwargs)\n File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 736, in create_job\n job_info = backend_implementation.batch_jobs.create_job(\n File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 890, in create_job\n with JobRegistry() as registry:\n File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/job_registry.py\", line 219, in __enter__\n self._zk.start()\n File \"/opt/openeo/lib/python3.8/site-packages/kazoo/client.py\", line 635, in start\n raise self.handler.timeout_exception(\"Connection time-out\")\nkazoo.handlers.threading.KazooTimeoutError: Connection time-out", "req_id": "r-227c4d850290472e99d4bb3f94593908", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"} {"message": "10.233.102.0 - - [19/Oct/2022:10:16:47 +0000] \"POST /openeo/1.1.0/jobs HTTP/1.1\" 500 129 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1666174607.920152, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null}

tcassaert commented 1 year ago

Can you access your ZK instance just with curl on port 2181?

zbenta commented 1 year ago

We are unable to access the ZK instance. We tried the host zookeeper-cscale.default.svc.cluster.local as per the result from the installation of the helm chart:

image

We even tried accessing using the cluster ip and got no result

image

nmap states that only port 22 and 111 are open if you do a regular scan

` [centos@k8s-cscale-k8s-master-nf-1 zookeeper]$ nmap 10.233.53.79

Starting Nmap 6.40 ( http://nmap.org ) at 2022-10-19 11:48 UTC Nmap scan report for 10.233.53.79 Host is up (0.00092s latency). Not shown: 998 closed ports PORT STATE SERVICE 22/tcp open ssh 111/tcp open rpcbind

Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds

If you scan for port 2181 it states that it is open: [centos@k8s-cscale-k8s-master-nf-1 zookeeper]$ nmap 10.233.53.79 -p 2181

Starting Nmap 6.40 ( http://nmap.org ) at 2022-10-19 11:52 UTC Nmap scan report for 10.233.53.79 Host is up (0.00015s latency). PORT STATE SERVICE 2181/tcp open unknown

Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds ` But if we do a curl to that port we get an empty reply from the server

[centos@k8s-cscale-k8s-master-nf-1 zookeeper]$ curl 10.233.53.79:2181 curl: (52) Empty reply from server

but if we use kubectl exec we can access on of the pods:

`[centos@k8s-cscale-k8s-master-nf-1 zookeeper]$ kubectl exec -it zookeeper-cscale-0 -n zookeeper -- zkCli.sh /opt/bitnami/java/bin/java Connecting to localhost:2181 Welcome to ZooKeeper! JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] `

tcassaert commented 1 year ago

The empty reply is not unusual if you just curl the endpoint.

I assume you have the openeo pod running. If you open a shell in it, can it access ZK? Do a curl -v telnet://zookeeper-cscale.default.svc.cluster.local:2181. If that returns something with Connected, then it should be ok.

zbenta commented 1 year ago

Yes, we are able to connect to the ZK inside the pods.

zbenta commented 1 year ago

After running this part of the code: image We get the output from the myspark-driver pod:

{"message": "Handling GET https://openeo.a.incd.pt/openeo/1.1.0/collections/SENTINEL2_L1C_INCD with data b''", "levelname": "INFO", "name": "openeo_driver.views", "created": 1666187963.47882, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-f80cf6b13b31407785cb8ba4d590d15c", "user_id": null} {"message": "10.233.102.0 - - [19/Oct/2022:13:59:23 +0000] \"GET /openeo/1.1.0/collections/SENTINEL2_L1C_INCD HTTP/1.1\" 200 5703 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1666187963.480629, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null}

After trying to donwload data: image

We get the usual output from the myspark-driver pod:

{"message": "Handling GET https://openeo.a.incd.pt/openeo/1.1.0/file_formats with data b''", "levelname": "INFO", "name": "openeo_driver.views", "created": 1666187994.6127324, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-723a029b19654561b067a14774b1643d", "user_id": null} {"message": "10.233.102.0 - - [19/Oct/2022:13:59:54 +0000] \"GET /openeo/1.1.0/file_formats HTTP/1.1\" 200 2963 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1666187994.6136456, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Handling POST https://openeo.a.incd.pt/openeo/1.1.0/jobs with data b'{\"title\": \"get_collection\", \"process\": {\"process_graph\": {\"loadcollection1\": {\"process_id\": \"load_collection\", \"arguments\": {\"bands\": [\"B02\", \"B03\", \"B04\"], \"id\": \"SENTINEL2_L1C_INCD\", \"spatial_extent\": {\"west\": -7.682155041704681, \"east\": -5.083888440142181, \"south\": 36.18203953636458, \"north\": 38.620982842287496, \"crs\": \"EPSG:4326\"}, \"temporal_extent\": [\"2018-01-01\", \"2021-01-01\"]}}, \"adddimension1\": {\"process_id\": \"add_dimension\", \"arguments\": {\"data\": {\"from_node\": \"loadcollection1\"}, \"label\": \"SENTINEL2_L1C_INCD\", \"name\": \"source_name\", \"type\": \"other\"}}, \"renamelabels1\": {\"process_id\": \"rename_labels\", \"arguments\": {\"data\": {\"from_node\": \"adddimension1\"}, \"dimension\": \"bands\", \"source\": [\"B02\", \"B03\", \"B04\"], \"target\": [\"swir\", \"nir\", \"green\"]}}, \"saveresult1\": {\"process_id\": \"save_result\", \"arguments\": {\"data\": {\"from_node\": \"renamelabels1\"}, \"format\": \"NetCDF\", \"options\": {}}, \"result\": true}}}}'", "levelname": "INFO", "name": "openeo_driver.views", "created": 1666187994.6457932, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-3fb8c6b355894f20943c88bf0835700b", "user_id": null} {"message": "<class 'openeo_driver.util.logging.FlaskUserIdLogging'> storing user id '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu' on <flask.g of 'openeo_driver.views'>", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1666187995.3817112, "filename": "logging.py", "lineno": 246, "process": 104, "req_id": "r-3fb8c6b355894f20943c88bf0835700b", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187995.3888297, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187995.458142, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187995.5683274, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187995.7969728, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187996.1534948, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187996.7704003, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187997.7963426, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666187999.8261368, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666188005.1368918, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null} {"message": "Failed connecting to Zookeeper within the connection retry policy.", "levelname": "WARNING", "name": "kazoo.client", "created": 1666188010.449988, "filename": "connection.py", "lineno": 515, "process": 104, "req_id": "no-request", "user_id": null} {"message": "KazooTimeoutError('Connection time-out')", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1666188010.4511547, "filename": "views.py", "lineno": 258, "process": 104, "exc_info": "Traceback (most recent call last):\n File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n rv = self.dispatch_request()\n File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py\", line 88, in decorated\n return f(*args, **kwargs)\n File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 736, in create_job\n job_info = backend_implementation.batch_jobs.create_job(\n File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 890, in create_job\n with JobRegistry() as registry:\n File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/job_registry.py\", line 219, in __enter__\n self._zk.start()\n File \"/opt/openeo/lib/python3.8/site-packages/kazoo/client.py\", line 635, in start\n raise self.handler.timeout_exception(\"Connection time-out\")\nkazoo.handlers.threading.KazooTimeoutError: Connection time-out", "req_id": "r-3fb8c6b355894f20943c88bf0835700b", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"} {"message": "10.233.102.0 - - [19/Oct/2022:14:00:10 +0000] \"POST /openeo/1.1.0/jobs HTTP/1.1\" 500 129 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1666188010.4542654, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null}

tcassaert commented 1 year ago

If you do an env | grep ZOOKEEPER in the openeo pod, does it show the correct value?

zbenta commented 1 year ago

If you do an env | grep ZOOKEEPER in the openeo pod, does it show the correct value?

Yes

[centos@k8s-cscale-k8s-master-nf-1 cscale]$ kubectl exec -it myspark-driver bash -n spark-jobs kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. bash-4.4$ env | grep ZOOKEEPER ZOOKEEPERNODES=zookeeper-cscale.zookeeper.svc.cluster.local:2821 bash-4.4$

zbenta commented 1 year ago

This is weird We get a connection refused

{"message": "Connection dropped: socket connection error: Connection refused", "levelname": "WARNING", "name": "kazoo.client", "created": 1666194433.412344, "filename": "connection.py", "lineno": 611, "process": 104, "req_id": "no-request", "user_id": null}

Do we have to setup any user_id, since it is always null?

{"message": "Failed connecting to Zookeeper within the connection retry policy.", "levelname": "WARNING", "name": "kazoo.client", "created": 1666194434.3139386, "filename": "connection.py", "lineno": 515, "process": 104, "req_id": "no-request", "user_id": null}

tcassaert commented 1 year ago

The snippet where you grep for ZOOKEEPER shows port 2821, while the default ZK port is 2181. So it might be just a typo?

zbenta commented 1 year ago

The snippet where you grep for ZOOKEEPER shows port 2821, while the default ZK port is 2181. So it might be just a typo?

It is no typo, it actually is port 2181 as per our values.yaml file image Thanks @tcassaert we'll change it to the default port and redeploy zookeeper to see if that is the issue.

tcassaert commented 1 year ago

You're indeed running ZK at port 2181, but you're not defining the correct port in the values.yaml of the openeo deployment if you're seeing port 2821 in the pod.

enolfc commented 1 year ago

@zbenta did you manage to get the port issue fixed?

Also what about the PVC? Is there any error that you could share to help there?

zbenta commented 1 year ago

Yes the port issue has been solved, our issue is now with cinder, we are still working on it. Cumprimentos / Best Regards, Zacarias Benta

On Thu, Oct 27, 2022 at 3:38 PM Enol Fernández @.***> wrote:

@zbenta https://github.com/zbenta did you manage to get the port issue fixed?

— Reply to this email directly, view it on GitHub https://github.com/c-scale-community/use-case-aquamonitor/issues/26#issuecomment-1293625815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM7L5R26BJ66TSQAFQVSVRLWFKHURANCNFSM5XMPH7JQ . You are receiving this because you were mentioned.Message ID: @.***>

enolfc commented 1 year ago

Yes the port issue has been solved, our issue is now with cinder, we are still working on it.

Are you using openstack kubernetes provider? I had trouble in the past with it, but have it working smoothly currently at another provider.

zbenta commented 1 year ago

Hi @enolfc, yes we are using kubernetes on top of openstack.

tcassaert commented 1 year ago

If it would be of any help, this is our values.yaml for the openstack-cinder-csi helm chart:

---
secret:
  enabled: true
  create: true
  name: cloud-config
  data:
    cloud.conf: |-
      [Global]
      auth-url=https://keystone.cloudferro.com:5000/v3
      os-endpoint-type=public
      username=*****
      password=*****
      region=WAW3-1
      tenant-id=*****
      tenant-domain-id=*****
      user-domain-name=******
storageClass:
  enabled: true
  delete:
    isDefault: true
    allowVolumeExpansion: true
  retain:
    isDefault: false
    allowVolumeExpansion: true
zbenta commented 1 year ago

We have solved the cinder issues and are now facing another error.

image

Here are the logs from the spark-driver:

{"message": "Handling POST https://openeo.a.incd.pt/openeo/1.1.0/jobs with data b'{\"title\": \"get_collection\", \"process\": {\"process_graph\": {\"loadcollection1\": {\"process_id\": \"load_collection\", \"arguments\": {\"bands\": [\"B02\", \"B03\", \"B04\"], \"id\": \"SENTINEL2_L1C_INCD\", \"spatial_extent\": {\"west\": -7.8897490816, \"south\": 36.0074603289, \"east\": -4.0014631782, \"north\": 38.8443648229, \"crs\": \"EPSG:4326\"}, \"temporal_extent\": [\"2019-11-01\", \"2019-11-03\"]}}, \"adddimension1\": {\"process_id\": \"add_dimension\", \"arguments\": {\"data\": {\"from_node\": \"loadcollection1\"}, \"label\": \"SENTINEL2_L1C_INCD\", \"name\": \"source_name\", \"type\": \"other\"}}, \"renamelabels1\": {\"process_id\": \"rename_labels\", \"arguments\": {\"data\": {\"from_node\": \"adddimension1\"}, \"dimension\": \"bands\", \"source\": [\"B02\", \"B03\", \"B04\"], \"target\": [\"blue\", \"green\", \"red\"]}}, \"saveresult1\": {\"process_id\": \"save_result\", \"arguments\": {\"data\": {\"from_node\": \"renamelabels1\"}, \"format\": \"NetCDF\", \"options\": {}}, \"result\": true}}}}'", "levelname": "INFO", "name": "openeo_driver.views", "created": 1667475597.8674283, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-0767df49bb564027b4650b7f3814e901", "user_id": null}
{"message": "<class 'openeo_driver.util.logging.FlaskUserIdLogging'> storing user id '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu' on <flask.g of 'openeo_driver.views'>", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1667475597.8676496, "filename": "logging.py", "lineno": 246, "process": 104, "req_id": "r-0767df49bb564027b4650b7f3814e901", "user_id": null}
{"message": "`POST /jobs` created batch job j-976c7549d32b48e9b3435147d16cd82f", "levelname": "INFO", "name": "openeo_driver.views", "created": 1667475597.9998925, "filename": "views.py", "lineno": 744, "process": 104, "req_id": "r-0767df49bb564027b4650b7f3814e901", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "10.233.102.0 - - [03/Nov/2022:11:39:57 +0000] \"POST /openeo/1.1.0/jobs HTTP/1.1\" 201 0 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1667475598.0008118, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null}
{"message": "Handling GET https://openeo.a.incd.pt/openeo/1.1.0/jobs/j-976c7549d32b48e9b3435147d16cd82f with data b''", "levelname": "INFO", "name": "openeo_driver.views", "created": 1667475598.0201814, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-cc47b7849e0a485992ff4d9202ef6574", "user_id": null}
{"message": "<class 'openeo_driver.util.logging.FlaskUserIdLogging'> storing user id '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu' on <flask.g of 'openeo_driver.views'>", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1667475598.0203927, "filename": "logging.py", "lineno": 246, "process": 104, "req_id": "r-cc47b7849e0a485992ff4d9202ef6574", "user_id": null}
{"message": "10.233.102.0 - - [03/Nov/2022:11:39:58 +0000] \"GET /openeo/1.1.0/jobs/j-976c7549d32b48e9b3435147d16cd82f HTTP/1.1\" 200 922 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1667475598.0769813, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null}
{"message": "Handling POST https://openeo.a.incd.pt/openeo/1.1.0/jobs/j-976c7549d32b48e9b3435147d16cd82f/results with data b''", "levelname": "INFO", "name": "openeo_driver.views", "created": 1667475598.0940666, "filename": "views.py", "lineno": 150, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": null}
{"message": "<class 'openeo_driver.util.logging.FlaskUserIdLogging'> storing user id '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu' on <flask.g of 'openeo_driver.views'>", "levelname": "DEBUG", "name": "openeo_driver.util.logging", "created": 1667475598.0942352, "filename": "logging.py", "lineno": 246, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": null}
{"message": "`POST /jobs/j-976c7549d32b48e9b3435147d16cd82f/results`: starting job (from status created", "levelname": "INFO", "name": "openeo_driver.views", "created": 1667475598.3328874, "filename": "views.py", "lineno": 808, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Starting job 'j-976c7549d32b48e9b3435147d16cd82f' from user User('98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu', {'oidc_userinfo': {'sub': '98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu', 'voperson_verified_email': ['zacarias@lip.pt'], 'email_verified': True, 'eduperson_scoped_affiliation': ['faculty@lip.pt'], 'eduperson_assurance': ['https://refeds.org/assurance/IAP/low', 'https://aai.egi.eu/LoA#Substantial'], 'email': 'zacarias@lip.pt'}}) (proxy user None)", "levelname": "INFO", "name": "openeogeotrellis.backend", "created": 1667475598.333054, "filename": "backend.py", "lineno": 1084, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "job_options: {}", "levelname": "DEBUG", "name": "openeogeotrellis.backend", "created": 1667475598.3989758, "filename": "backend.py", "lineno": 1115, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Creating new InMemoryServiceRegistry: <openeogeotrellis.service_registry.InMemoryServiceRegistry object at 0x7fde863dd220>", "levelname": "INFO", "name": "openeogeotrellis.service_registry", "created": 1667475598.4002671, "filename": "service_registry.py", "lineno": 76, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Reading layer catalog metadata from /opt/layercatalog.json", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1667475598.4005046, "filename": "layercatalog.py", "lineno": 683, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Updating SENTINEL2_L1C_INCD metadata from https://resto.c-scale.zcu.cz:S2", "levelname": "INFO", "name": "openeogeotrellis.layercatalog", "created": 1667475598.4010637, "filename": "layercatalog.py", "lineno": 713, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Traceback (most recent call last):\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 715, in get_layer_catalog\n    opensearch_metadata[cid] = opensearch_instance(os_endpoint).get_metadata(collection_id=os_cid)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py\", line 703, in opensearch_instance\n    raise ValueError(endpoint)\nValueError: https://resto.c-scale.zcu.cz\n", "levelname": "WARNING", "name": "openeogeotrellis.layercatalog", "created": 1667475598.401245, "filename": "layercatalog.py", "lineno": 717, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Creating merged collections for common names: set()", "levelname": "DEBUG", "name": "openeogeotrellis.layercatalog", "created": 1667475598.4013343, "filename": "layercatalog.py", "lineno": 771, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "apply_process save_result with {'data': {'from_node': 'renamelabels1', 'node': {'process_id': 'rename_labels', 'arguments': {'data': {'from_node': 'adddimension1', 'node': {'process_id': 'add_dimension', 'arguments': {'data': {'from_node': 'loadcollection1', 'node': {'process_id': 'load_collection', 'arguments': {'bands': ['B02', 'B03', 'B04'], 'id': 'SENTINEL2_L1C_INCD', 'spatial_extent': {'west': -7.8897490816, 'south': 36.0074603289, 'east': -4.0014631782, 'north': 38.8443648229, 'crs': 'EPSG:4326'}, 'temporal_extent': ['2019-11-01', '2019-11-03']}}}, 'label': 'SENTINEL2_L1C_INCD', 'name': 'source_name', 'type': 'other'}}}, 'dimension': 'bands', 'source': ['B02', 'B03', 'B04'], 'target': ['blue', 'green', 'red']}}}, 'format': 'NetCDF', 'options': {}}", "levelname": "DEBUG", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4463136, "filename": "ProcessGraphDeserializer.py", "lineno": 1416, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "apply_process rename_labels with {'data': {'from_node': 'adddimension1', 'node': {'process_id': 'add_dimension', 'arguments': {'data': {'from_node': 'loadcollection1', 'node': {'process_id': 'load_collection', 'arguments': {'bands': ['B02', 'B03', 'B04'], 'id': 'SENTINEL2_L1C_INCD', 'spatial_extent': {'west': -7.8897490816, 'south': 36.0074603289, 'east': -4.0014631782, 'north': 38.8443648229, 'crs': 'EPSG:4326'}, 'temporal_extent': ['2019-11-01', '2019-11-03']}}}, 'label': 'SENTINEL2_L1C_INCD', 'name': 'source_name', 'type': 'other'}}}, 'dimension': 'bands', 'source': ['B02', 'B03', 'B04'], 'target': ['blue', 'green', 'red']}", "levelname": "DEBUG", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4465578, "filename": "ProcessGraphDeserializer.py", "lineno": 1416, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "apply_process add_dimension with {'data': {'from_node': 'loadcollection1', 'node': {'process_id': 'load_collection', 'arguments': {'bands': ['B02', 'B03', 'B04'], 'id': 'SENTINEL2_L1C_INCD', 'spatial_extent': {'west': -7.8897490816, 'south': 36.0074603289, 'east': -4.0014631782, 'north': 38.8443648229, 'crs': 'EPSG:4326'}, 'temporal_extent': ['2019-11-01', '2019-11-03']}}}, 'label': 'SENTINEL2_L1C_INCD', 'name': 'source_name', 'type': 'other'}", "levelname": "DEBUG", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4466927, "filename": "ProcessGraphDeserializer.py", "lineno": 1416, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "apply_process load_collection with {'bands': ['B02', 'B03', 'B04'], 'id': 'SENTINEL2_L1C_INCD', 'spatial_extent': {'west': -7.8897490816, 'south': 36.0074603289, 'east': -4.0014631782, 'north': 38.8443648229, 'crs': 'EPSG:4326'}, 'temporal_extent': ['2019-11-01', '2019-11-03']}", "levelname": "DEBUG", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4468012, "filename": "ProcessGraphDeserializer.py", "lineno": 1416, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Using process 'load_collection' from namespace 'backend'.", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4469187, "filename": "ProcessGraphDeserializer.py", "lineno": 1521, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Using process 'add_dimension' from namespace 'backend'.", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4473073, "filename": "ProcessGraphDeserializer.py", "lineno": 1521, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Using process 'rename_labels' from namespace 'backend'.", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4474895, "filename": "ProcessGraphDeserializer.py", "lineno": 1521, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Using process 'save_result' from namespace 'backend'.", "levelname": "INFO", "name": "openeo_driver.ProcessGraphDeserializer", "created": 1667475598.4476476, "filename": "ProcessGraphDeserializer.py", "lineno": 1521, "process": 104, "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "Dry run extracted these source constraints: [(('load_collection', ('SENTINEL2_L1C_INCD', ())), {'temporal_extent': ('2019-11-01', '2019-11-03'), 'spatial_extent': {'west': -7.8897490816, 'south': 36.0074603289, 'east': -4.0014631782, 'north': 38.8443648229, 'crs': 'EPSG:4326'}, 'bands': ['B02', 'B03', 'B04']})]", "levelname": "INFO", "name": "openeogeotrellis.backend", "created": 1667475598.4478507, "filename": "backend.py", "lineno": 1403, "process": 104, "job_id": "j-976c7549d32b48e9b3435147d16cd82f", "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "NoCredentialsError('Unable to locate credentials')", "levelname": "ERROR", "name": "openeo_driver.views.error", "created": 1667475599.3013573, "filename": "views.py", "lineno": 258, "process": 104, "exc_info": "Traceback (most recent call last):\n  File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1516, in full_dispatch_request\n    rv = self.dispatch_request()\n  File \"/opt/openeo/lib/python3.8/site-packages/flask/app.py\", line 1502, in dispatch_request\n    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/users/auth.py\", line 88, in decorated\n    return f(*args, **kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py\", line 809, in queue_job\n    backend_implementation.batch_jobs.start_job(job_id=job_id, user=user)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 1091, in start_job\n    self._start_job(job_id, user.user_id)\n  File \"/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/backend.py\", line 1201, in _start_job\n    s3_instance.create_bucket(Bucket=bucket)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/client.py\", line 357, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/client.py\", line 662, in _make_api_call\n    http, parsed_response = self._make_request(\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/client.py\", line 682, in _make_request\n    return self._endpoint.make_request(operation_model, request_dict)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/endpoint.py\", line 102, in make_request\n    return self._send_request(request_dict, operation_model)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/endpoint.py\", line 132, in _send_request\n    request = self.create_request(request_dict, operation_model)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/endpoint.py\", line 115, in create_request\n    self._event_emitter.emit(event_name, request=request,\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/hooks.py\", line 356, in emit\n    return self._emitter.emit(aliased_event_name, **kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/hooks.py\", line 228, in emit\n    return self._emit(event_name, kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/hooks.py\", line 211, in _emit\n    response = handler(**kwargs)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/signers.py\", line 90, in handler\n    return self.sign(operation_name, request)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/signers.py\", line 162, in sign\n    auth.add_auth(request)\n  File \"/opt/openeo/lib/python3.8/site-packages/botocore/auth.py\", line 357, in add_auth\n    raise NoCredentialsError\nbotocore.exceptions.NoCredentialsError: Unable to locate credentials", "req_id": "r-0dd3abc1a4c148ada9972d006db85944", "user_id": "98dfa532579388c49bea11b50d390929551d2990ac826d5ef49a602ee7c60d97@egi.eu"}
{"message": "10.233.102.0 - - [03/Nov/2022:11:39:59 +0000] \"POST /openeo/1.1.0/jobs/j-976c7549d32b48e9b3435147d16cd82f/results HTTP/1.1\" 500 139 \"-\" \"openeo-python-client/0.11.0 cpython/3.9.7 linux\"", "levelname": "INFO", "name": "gunicorn.access", "created": 1667475599.3029277, "filename": "glogging.py", "lineno": 349, "process": 104, "req_id": "no-request", "user_id": null}
tcassaert commented 1 year ago

@zbenta did you configure the S3 access via environment variables?

zbenta commented 1 year ago

@zbenta did you configure the S3 access via environment variables?

Do we do that on the zookeeper values.yaml?

Our cinder-csi-plugin containers have the s3 credentials configured:

# cat cloud.conf
[Global]
auth-url="https://stratus.ncg.ingrid.pt:5000/v3/"
username="MYUSERNAME"
password="MYPWD"
region="RegionOne"
tenant-id="MYTENANT"
tenant-name="TENANTNAME"
domain-name="DOMAINNAME"

[BlockStorage]
bs-version=v3
ignore-volume-az=False

We also have the PVC's created :

[centos@k8s-cscale-k8s-master-nf-1 ~]$ kubectl get pvc -n zookeeper
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-zookeeper-cscale-0   Bound    pvc-db950c55-81cb-460d-993f-3c11178208dc   16Gi       RWO            standard       130m
data-zookeeper-cscale-1   Bound    pvc-a80add81-bdfc-4785-8522-d93306614fc4   16Gi       RWO            standard       130m
data-zookeeper-cscale-2   Bound    pvc-a28242c6-ce12-4076-98d6-93753016514d   16Gi       RWO            standard       130m

And we can see them on the openstack interface:

image

tcassaert commented 1 year ago

No, it's part of the openEO deployment itself.

envVars:
    SWIFT_URL: "https://s3.waw3-1.cloudferro.com"
    AWS_ACCESS_KEY_ID: "${aws_access_key_id}"
    AWS_SECRET_ACCESS_KEY: "${aws_secret_access_key}"

for both the driver and executor.

zbenta commented 1 year ago

No, it's part of the openEO deployment itself.

envVars:
    SWIFT_URL: "https://s3.waw3-1.cloudferro.com"
    AWS_ACCESS_KEY_ID: "${aws_access_key_id}"
    AWS_SECRET_ACCESS_KEY: "${aws_secret_access_key}"

for both the driver and executor.

We'll set ti up then, thanks for the input.