Closed backeb closed 3 years ago
@backeb someone mentioned terraform templates, can you check and put them here in this issue?
The k8s setup in creodias has these docs: https://creodias.eu/faq-other/-/asset_publisher/SIs09LQL6Gct/content/how-to-configure-kubernetes using kubespray: https://github.com/kubernetes-sigs/kubespray
@backeb someone mentioned terraform templates, can you check and put them here in this issue?
@jdries could you please provide links to the OpenEO / Dask terraform template for @mariojmdavid and his team?
please add my coleagues Tiago: tiagofglip Zacarias: zbenta Miguel: miguelviana95
@mariojmdavid @tiagofglip @zbenta @miguelviana95 could you update us on your progress installing openOE platform? If you have any questions please contact @jdries
cc @gena @avgils
@mariojmdavid @tiagofglip @zbenta @miguelviana95 could you update us on your progress installing openOE platform? If you have any questions please contact @jdries
cc @gena @avgils
Sorry for the late reply @backeb, we were all on vacation, me and @tiagofglip wil take a look at openOE during this week, as soon as we have more info we'll get back to you.
We are having some issues while deploying the cluster on our kubernetes cluster.
This repo doesn't exist: helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
We had to change it to: helm repo add incubator https://charts.helm.sh/incubator
We are also having issues with the image that we have in the https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/kubernetes/openeo.yaml file. The image: vito-docker.artifactory.vgt.vito.be/openeo-geotrellis:0.1.8 in not available, we had to change the version to latest.
The next issue, and the one we are still trying to overcome is the fact that whenever we try to deploy the openEO spark job, we get the following error in octant:
MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "openeo-geotrellis-1627996966941-driver-conf-map" not found
Any thoughts?
We are having some issues while deploying the cluster on our kubernetes cluster.
This repo doesn't exist: helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
We had to change it to: helm repo add incubator https://charts.helm.sh/incubator
We are also having issues with the image that we have in the https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/kubernetes/openeo.yaml file. The image: vito-docker.artifactory.vgt.vito.be/openeo-geotrellis:0.1.8 in not available, we had to change the version to latest.
The next issue, and the one we are still trying to overcome is the fact that whenever we try to deploy the openEO spark job, we get the following error in octant:
MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "openeo-geotrellis-1627996966941-driver-conf-map" not found
Any thoughts?
@jdries could you assist with the above?
I forwarded the problem to my devops colleague, who may know a bit better what this is about!
We've fount this post with the same issue:
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/946
They are stating that the problem might be "I found that operator creates the driver pod prior to the relative CM."
We also removed the version spark-operator we had installed previously and installed the one from google: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/charts/spark-operator-chart
But the result is always the same.
First feedback: it vaguely looks familiar, is it possible that this is just a warning, and can actually be ignored, because the rest of the steps do seem to work? @backeb Could you add github user 'tcassaert' to this project, so my colleague can interact directly if needed?
@backeb Could you add github user 'tcassaert' to this project, so my colleague can interact directly if needed?
Done ✅
Well, on the logs we can see also this error:
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.233.123.51 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kubernetes.py
21/08/03 15:20:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
python: can't open file '/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kubernetes.py': [Errno 2] No such file or directory
Don't know if it's because we're using a different image: vito-docker.artifactory.vgt.vito.be/openeo-geotrellis:latest, and not 0.1.8 version
We had a look, this is indeed the real issue. The image is correct, and contains latest software, but the deployment files do need an update, because we internally switched to a more automated deploy based on helm charts and terraform. We'll look into the best option to get you going again!
They are stating that the problem might be "I found that operator creates the driver pod prior to the relative CM."
That could indeed be the problem. We see the same error if we look into the pod events
, but the pod starts without a problem and I haven't seen anything missing or haven't seen any problems regarding that configmap.
We've looked into the best way to get you guys back on track to deploy everything.
The https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/kubernetes/openeo.yaml file is pretty old and we've switched to a Helm based deployment ourselves.
This Helm based deployment is using the sparkapplication
helm chart, located at https://github.com/Open-EO/openeo-geotrellis-kubernetes/tree/master/kubernetes/charts/sparkapplication.
The README.md
contains a sample values.yaml
file with the most important variables.
The Helm chart can be used with:
helm repo add helm-charts https://artifactory.vgt.vito.be/helm-charts
Version 0.3.6
is the best tested one. The latest version is using another Ingress type, but is not currently in use by us.
Thanks for all you suppport,
We have rebuilt the cluster and tried to deploy OpenEO as follows:
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator kubectl create namespace spark-jobs helm install spark-operator/spark-operator --generate-name --create-namespace --namespace spark-operator --set sparkJobNamespace=spark-jobs --set enableWebhook=true helm list -n spark-operator kubectl get pods -n spark-operator kubectl get serviceaccounts -n spark-jobs cd openeo/ cd openeo-geotrellis-kubernetes/ cd kubernetes/ cd charts/ cd sparkapplication/ vim values_2.yaml helm repo add sparkapp https://artifactory.vgt.vito.be/helm-charts helm install sparkapp/sparkapplication --generate-name --namespace spark-jobs -f values_2.yaml
Our values_2.yaml file, which was created/copied as per your sample file, is as follows:
`--- image: "vito-docker.artifactory.vgt.vito.be/openeo-geotrellis" imageVersion: "latest" jmxExporterJar: "/opt/jmx_prometheus_javaagent-0.13.0.jar" mainApplicationFile: "local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py" serviceAccount: "openeo" volumes:
While taking a look at the logs we can see the following:
SparkApplication sparkapplication-1628151789 failed: failed to run spark-submit for SparkApplication spark-jobs/sparkapplication-1628151789: WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 21/08/05 08:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/08/05 08:30:00 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 21/08/05 08:30:00 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image. 21/08/05 08:30:00 WARN DriverCommandFeatureStep: spark.kubernetes.pyspark.pythonVersion was deprecated in Spark 3.1. Please set 'spark.pyspark.python' and 'spark.pyspark.driver.python' configurations or PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables instead. Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.233.0.1/api/v1/namespaces/spark-jobs/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "sparkapplication-1628151789-driver" is forbidden: error looking up service account spark-jobs/openeo: serviceaccount "openeo" not found. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:589) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:526) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:492) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:252) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:879) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:341) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:84) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:139) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2611) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:1030) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 21/08/05 08:30:01 INFO ShutdownHookManager: Shutdown hook called 21/08/05 08:30:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-250423ae-61ec-40f6-9b1c-3d892a8b0af7 |
---|
Any thoughts?
Just noticed that we hadn't changed the service account value to the one existing in our setup, trying to deploy the sparp-application again. We'll get back to you soon with news.
We belive that our problem is regarding to the image that we are using in the container, we have tried several versions and we are always getting the same info on the logs:
`+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
We are using images from this repo: https://vito-docker.artifactory.vgt.vito.be/webapp/#/packages/docker/openeo-geotrellis/?state=eyJxdWVyeSI6eyJwa2ciOiJvcGVuZW8ifX0%3D
The last log zbenta post here happened when commenting jmxExporterJar: "/opt/jmx_prometheus_javaagent-0.13.0.jar in values.yaml.
If that line is not commented, the error is the following:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.io.FileNotFoundException: /etc/metrics/conf/prometheus.yaml (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileReader.<init>(FileReader.java:72)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector.<init>(JmxCollector.java:75)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:29)
... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Commenting out the Prometheus exporter is a good idea, it's an optional part for metrics. So we can focus on the first error, which claims something is wrong with python syntax. We deployed the latest openeo-geotrellis image today, and that worked fine. @tcassaert does this syntax error look familiar?
Thanks for the indor @jdries, we've tested it with the latest image and the "syntax error persists". It looks like the python interpreter doens't like the function definition. I even tried running the image in my local machine to see if there was any file missing, or if the paths on the yaml were wrong. While running the app on the local docker image I get the following output:
root@4c29634cd856:/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy# python3.7 kube.py
Adding process 'pi' without implementation
Adding process 'e' without implementation
starting spark context
21/08/05 12:37:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "kube.py", line 89, in
This second attempt doesn't seem to have the syntax issue, because it gets past line 50. It throws an error because of not finding zookeeper nodes. You can disable zookeeper usage by emulating a CI context. Can you try setting the environment variable 'TRAVIS' to 1? Zookeeper is also optional, so it should work like this.
This second attempt doesn't seem to have the syntax issue, because it gets past line 50. It throws an error because of not finding zookeeper nodes. You can disable zookeeper usage by emulating a CI context. Can you try setting the environment variable 'TRAVIS' to 1? Zookeeper is also optional, so it should work like this.
The log that we've shown before is when we run the docker image in our local machines, it has nothing to do with the log that kubernetes gives us. the kubernetes log shows the following:
21/08/05 12:29:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 50 def setup_batch_jobs() -> None: ^ SyntaxError: invalid syntax log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Where should we define the TRAVIS env var in the executor or in the driver?
specifying TRAVIS in the driver should be sufficient. I indeed understand that the syntax error only occurs when you run on Kubernetes, and not when you run it locally, or in our Kubernetes. The only thing that's perhaps special about that line is the return type '-> None', but if you are using Python 3.7 (which seems to be the case), that should work...
This is going to be a long one, sorry.
Just to make things clear we have the spark operator up and running on our k8s cluster.
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# kubectl -n spark-operator get pods
NAME READY STATUS RESTARTS AGE
spark-operator-1628151073-5f74549799-xn27k 1/1 Running 0 6h7m
We have deployed the sparkapplication chart unsing the following command:
helm install myspark sparkapp/sparkapplication --namespace spark-jobs -f values.yaml
The vaules.yaml is as follows, we've tried both with and without TRAVIS:"1" on the driver section :
image: "vito-docker.artifactory.vgt.vito.be/openeo-geotrellis"
imageVersion: "latest"
#jmxExporterJar: "/opt/jmx_prometheus_javaagent-0.13.0.jar"
mainApplicationFile: "local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py"
serviceAccount: "spark-operator-1628151073-spark"
volumes:
- name: "eodata"
hostPath:
path: "/eodata"
type: "DirectoryOrCreate"
volumeMounts:
- name: "eodata"
mountPath: "/eodata"
executor:
memory: "4096m"
cpu: 5
envVars:
OPENEO_CATALOG_FILES: "/opt/layercatalog.json"
OPENEO_S1BACKSCATTER_ELEV_GEOID: "/opt/openeo-vito-aux-data/egm96.grd"
OTB_HOME: "/opt/orfeo-toolbox"
OTB_APPLICATION_PATH: "/opt/orfeo-toolbox/lib/otb/applications"
KUBE: "true"
GDAL_NUM_THREADS: "2"
javaOptions: "-Dlog4j.configuration=log4j.properties -Dscala.concurrent.context.numThreads=4 -Dscala.concurrent.context.maxThreads=4"
driver:
memory: "4096m"
cpu: 5
envVars:
KUBE: "true"
KUBE_OPENEO_API_PORT: "50001"
DRIVER_IMPLEMENTATION_PACKAGE: "openeogeotrellis"
OPENEO_CATALOG_FILES: "/opt/layercatalog.json"
OPENEO_S1BACKSCATTER_ELEV_GEOID: "/opt/openeo-vito-aux-data/egm96.grd"
OTB_HOME: "/opt/orfeo-toolbox"
OTB_APPLICATION_PATH: "/opt/orfeo-toolbox/lib/otb/applications"
javaOptions: "-Dlog4j.configuration=log4j.properties -Dscala.concurrent.context.numThreads=6 -Dpixels.treshold=1000000"
sparkConf:
"spark.executorEnv.DRIVER_IMPLEMENTATION_PACKAGE": "openeogeotrellis"
"spark.extraListeners": "org.openeo.sparklisteners.CancelRunawayJobListener"
"spark.appMasterEnv.DRIVER_IMPLEMENTATION_PACKAGE": "openeogeotrellis"
"spark.executorEnv.GDAL_NUM_THREADS": "2"
"spark.executorEnv.GDAL_DISABLE_READDIR_ON_OPEN": "EMPTY_DIR"
jarDependencies:
- 'local:///opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar'
- 'local:///opt/geotrellis-backend-assembly-0.4.6-openeo.jar'
fileDependencies:
- 'local:///opt/layercatalog.json'
service:
enabled: true
port: 50001
# ingress:
# annotations:
# kubernetes.io/ingress.class: traefik
# enabled: true
# hosts:
# - host: openeo.example.com
# paths:
# - '/'
rbac:
create: false
serviceAccountName: spark-operator-1628151073-spark
# spark_ui:
# port: 4040
# ingress:
# enabled: true
# annotations:
# kubernetes.io/ingress.class: traefik
# hosts:
# - host: spark-ui.openeo.example.com
# paths:
# - '/'
What we get when we consult the log, after having deployed the chart is as follows:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# kubectl -n spark-jobs logs myspark-driver
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.233.120.37 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py
21/08/05 14:02:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 50
def setup_batch_jobs() -> None:
^
SyntaxError: invalid syntax
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
The driver is the only pod we have, but it is in an error state:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# kubectl -n spark-jobs get pods
NAME READY STATUS RESTARTS AGE
myspark-driver 0/1 Error 0 20m
This is not something we encountered when setting it up.
What version of the spark operator are you using?
This is not something we encountered when setting it up.
What version of the spark operator are you using?
Whe are using the following:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# helm list -n spark-operator
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
spark-operator-1628151073 spark-operator 1 2021-08-05 08:11:22.672795838 +0000 UTC deployed spark-operator-1.1.6 v1beta2-1.2.3-3.1.1
From google:
spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
Because the one that is in the doceumentation is deprecated and gave us errors while installing it.
We're currently on
spark-operator spark-operator 1 2021-07-05 12:31:50.28283282 +0000 UTC deployed sparkoperator-0.8.4 v1beta2-1.2.0-3.0.0
So maybe you could try this version?
Next to that, I also just made a commit to remove the '-> None', in fact, specifying an empty return type like this is not really necessary nor helpful in Python. This may not solve the actual issue, but would hopefully get us a bit further and maybe reveal the underlying issue a bit better.
We're currently on
spark-operator spark-operator 1 2021-07-05 12:31:50.28283282 +0000 UTC deployed sparkoperator-0.8.4 v1beta2-1.2.0-3.0.0
So maybe you could try this version?
Since the chart repo that is shown on the documentation is not available, we searched for alternative ones and found the following:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# helm repo list
NAME URL
incubator https://charts.helm.sh/incubator
spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
The versions available in each one of them are:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# helm search repo incubator/sparkoperator
NAME CHART VERSION APP VERSION DESCRIPTION
incubator/sparkoperator 0.8.6 v1beta2-1.2.0-3.0.0 DEPRECATED A Helm chart for Spark on Kubernetes...
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# helm search repo spark-operator/spark-operator
NAME CHART VERSION APP VERSION DESCRIPTION
spark-operator/spark-operator 1.1.6 v1beta2-1.2.3-3.1.1 A Helm chart for Spark on Kubernetes operator
What repo are you using, can you provide us with the url?
I can't find the 0.8.4
version anymore in any upstream repo either.
But we've mirrored it to our artifactory, so you should be able to find the 0.8.4
in https://artifactory.vgt.vito.be/helm-charts. The chart itself is named sparkoperator
.
Thanks @tcassaert and @jdries for your suppport.
Here goes another long one :smile:
We have given a step forward, but still the pods won't run.
We've installed the chart versions as per your recomendations
The sparkoperator version 0.8.4
helm install myoperator sparkapp/sparkoperator --create-namespace --namespace spark-operator --set sparkJobNamespace=spark-jobs --set enableWebhook=true --version=0.8.4
The sparkapplication version 0.3.6
helm install myspark sparkapp/sparkapplication --namespace spark-jobs -f values_2.yaml --version=0.3.6
The image we are using is: vito-docker.artifactory.vgt.vito.be/openeo-geotrellis
in the latest version
Here is some more information regarding our deployment of the sparkapplication, we hope this helps:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# kubectl -n spark-jobs describe sparkapplications myspark
Name: myspark
Namespace: spark-jobs
Labels: app.kubernetes.io/managed-by=Helm
chartname=sparkapplication
release=myspark
revision=1
sparkVersion=2.4.5
version=0.3.6
Annotations: meta.helm.sh/release-name: myspark
meta.helm.sh/release-namespace: spark-jobs
API Version: sparkoperator.k8s.io/v1beta2
Kind: SparkApplication
Metadata:
Creation Timestamp: 2021-08-06T09:08:57Z
Generation: 1
Managed Fields:
API Version: sparkoperator.k8s.io/v1beta2
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:meta.helm.sh/release-name:
f:meta.helm.sh/release-namespace:
f:labels:
.:
f:app.kubernetes.io/managed-by:
f:chartname:
f:release:
f:revision:
f:sparkVersion:
f:version:
f:spec:
.:
f:deps:
.:
f:files:
f:jars:
f:driver:
.:
f:cores:
f:envVars:
.:
f:DRIVER_IMPLEMENTATION_PACKAGE:
f:IMAGE_NAME:
f:KUBE:
f:KUBE_OPENEO_API_PORT:
f:OPENEO_CATALOG_FILES:
f:OPENEO_S1BACKSCATTER_ELEV_GEOID:
f:OTB_APPLICATION_PATH:
f:OTB_HOME:
f:TRAVIS:
f:hostNetwork:
f:javaOptions:
f:labels:
.:
f:app.kubernetes.io/name:
f:release:
f:revision:
f:sparkVersion:
f:version:
f:memory:
f:serviceAccount:
f:volumeMounts:
f:executor:
.:
f:cores:
f:envVars:
.:
f:GDAL_NUM_THREADS:
f:KUBE:
f:OPENEO_CATALOG_FILES:
f:OPENEO_S1BACKSCATTER_ELEV_GEOID:
f:OTB_APPLICATION_PATH:
f:OTB_HOME:
f:hostNetwork:
f:instances:
f:javaOptions:
f:labels:
.:
f:release:
f:revision:
f:sparkVersion:
f:version:
f:memory:
f:serviceAccount:
f:volumeMounts:
f:image:
f:imagePullPolicy:
f:mainApplicationFile:
f:mode:
f:pythonVersion:
f:restartPolicy:
.:
f:onFailureRetries:
f:onFailureRetryInterval:
f:onSubmissionFailureRetries:
f:onSubmissionFailureRetryInterval:
f:type:
f:sparkConf:
.:
f:spark.appMasterEnv.DRIVER_IMPLEMENTATION_PACKAGE:
f:spark.executorEnv.DRIVER_IMPLEMENTATION_PACKAGE:
f:spark.executorEnv.GDAL_DISABLE_READDIR_ON_OPEN:
f:spark.executorEnv.GDAL_NUM_THREADS:
f:spark.extraListeners:
f:sparkVersion:
f:type:
f:volumes:
Manager: helm
Operation: Update
Time: 2021-08-06T09:08:57Z
API Version: sparkoperator.k8s.io/v1beta2
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:applicationState:
.:
f:errorMessage:
f:state:
f:driverInfo:
.:
f:podName:
f:webUIAddress:
f:webUIPort:
f:webUIServiceName:
f:executionAttempts:
f:executorState:
.:
f:myspark-1628241205030-exec-1:
f:lastSubmissionAttemptTime:
f:sparkApplicationId:
f:submissionAttempts:
f:submissionID:
f:terminationTime:
Manager: spark-operator
Operation: Update
Time: 2021-08-06T09:13:56Z
Resource Version: 371613
UID: 3f08781d-9286-43ce-a1f0-3df5d5f8cec9
Spec:
Deps:
Files:
local:///opt/layercatalog.json
Jars:
local:///opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar
local:///opt/geotrellis-backend-assembly-0.4.6-openeo.jar
Driver:
Cores: 2
Env Vars:
DRIVER_IMPLEMENTATION_PACKAGE: openeogeotrellis
IMAGE_NAME: vito-docker.artifactory.vgt.vito.be/openeo-geotrellis:latest
KUBE: true
KUBE_OPENEO_API_PORT: 50001
OPENEO_CATALOG_FILES: /opt/layercatalog.json
OPENEO_S1BACKSCATTER_ELEV_GEOID: /opt/openeo-vito-aux-data/egm96.grd
OTB_APPLICATION_PATH: /opt/orfeo-toolbox/lib/otb/applications
OTB_HOME: /opt/orfeo-toolbox
TRAVIS: 1
Host Network: false
Java Options: -Dlog4j.configuration=log4j.properties -Dscala.concurrent.context.numThreads=6 -Dpixels.treshold=1000000
Labels:
app.kubernetes.io/name: myspark-driver
Release: myspark
Revision: 1
Spark Version: 2.4.5
Version: 0.3.6
Memory: 4096m
Service Account: myoperator-spark
Volume Mounts:
Mount Path: /eodata
Name: eodata
Executor:
Cores: 2
Env Vars:
GDAL_NUM_THREADS: 2
KUBE: true
OPENEO_CATALOG_FILES: /opt/layercatalog.json
OPENEO_S1BACKSCATTER_ELEV_GEOID: /opt/openeo-vito-aux-data/egm96.grd
OTB_APPLICATION_PATH: /opt/orfeo-toolbox/lib/otb/applications
OTB_HOME: /opt/orfeo-toolbox
Host Network: false
Instances: 1
Java Options: -Dlog4j.configuration=log4j.properties -Dscala.concurrent.context.numThreads=4 -Dscala.concurrent.context.maxThreads=4
Labels:
Release: myspark
Revision: 1
Spark Version: 2.4.5
Version: 0.3.6
Memory: 4096m
Service Account: myoperator-spark
Volume Mounts:
Mount Path: /eodata
Name: eodata
Image: vito-docker.artifactory.vgt.vito.be/openeo-geotrellis:latest
Image Pull Policy: IfNotPresent
Main Application File: local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py
Mode: cluster
Python Version: 3
Restart Policy:
On Failure Retries: 3
On Failure Retry Interval: 10
On Submission Failure Retries: 5
On Submission Failure Retry Interval: 20
Type: OnFailure
Spark Conf:
spark.appMasterEnv.DRIVER_IMPLEMENTATION_PACKAGE: openeogeotrellis
spark.executorEnv.DRIVER_IMPLEMENTATION_PACKAGE: openeogeotrellis
spark.executorEnv.GDAL_DISABLE_READDIR_ON_OPEN: EMPTY_DIR
spark.executorEnv.GDAL_NUM_THREADS: 2
spark.extraListeners: org.openeo.sparklisteners.CancelRunawayJobListener
Spark Version: 2.4.5
Type: Python
Volumes:
Host Path:
Path: /eodata
Type: DirectoryOrCreate
Name: eodata
Status:
Application State:
Error Message: driver container failed with ExitCode: 1, Reason: Error
State: FAILED
Driver Info:
Pod Name: myspark-driver
Web UI Address: 10.233.33.186:4040
Web UI Port: 4040
Web UI Service Name: myspark-ui-svc
Execution Attempts: 4
Executor State:
myspark-1628241205030-exec-1: FAILED
Last Submission Attempt Time: 2021-08-06T09:13:15Z
Spark Application Id: spark-fb1aeabd18bf492a894bc90788341642
Submission Attempts: 1
Submission ID: 0c8e66a8-ea83-4120-a0e2-431becbd31e0
Termination Time: 2021-08-06T09:13:54Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SparkApplicationAdded 18m spark-operator SparkApplication myspark was added, enqueuing it for submission
Normal SparkExecutorPending 18m spark-operator Executor myspark-1628240951293-exec-1 is pending
Normal SparkExecutorRunning 18m spark-operator Executor myspark-1628240951293-exec-1 is running
Normal SparkExecutorPending 17m (x2 over 17m) spark-operator Executor myspark-1628241024502-exec-1 is pending
Normal SparkExecutorRunning 17m spark-operator Executor myspark-1628241024502-exec-1 is running
Normal SparkExecutorPending 15m spark-operator Executor myspark-1628241115447-exec-1 is pending
Normal SparkExecutorRunning 15m spark-operator Executor myspark-1628241115447-exec-1 is running
Warning SparkApplicationPendingRerun 14m (x3 over 17m) spark-operator SparkApplication myspark is pending rerun
Normal SparkApplicationSubmitted 14m (x4 over 18m) spark-operator SparkApplication myspark was submitted successfully
Normal SparkDriverRunning 14m (x4 over 18m) spark-operator Driver myspark-driver is running
Normal SparkExecutorPending 14m spark-operator Executor myspark-1628241205030-exec-1 is pending
Normal SparkExecutorRunning 14m spark-operator Executor myspark-1628241205030-exec-1 is running
Warning SparkDriverFailed 13m (x4 over 17m) spark-operator Driver myspark-driver failed
The pods both start, this is new for us, they stay up for about 25 second, but then, they get destroyed.
The log we get is the following, we hope this will also help:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# kubectl -n spark-jobs logs -f myspark-driver
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.7.3'
+ export PYTHON_VERSION=3.7.3
+ PYTHON_VERSION=3.7.3
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.233.120.51 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py
21/08/06 09:13:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Adding process 'e' without implementation
Adding process 'pi' without implementation
starting spark context
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/08/06 09:13:22 INFO SparkContext: Running Spark version 2.4.5
21/08/06 09:13:22 INFO SparkContext: Submitted application: myspark
21/08/06 09:13:23 INFO SecurityManager: Changing view acls to: root
21/08/06 09:13:23 INFO SecurityManager: Changing modify acls to: root
21/08/06 09:13:23 INFO SecurityManager: Changing view acls groups to:
21/08/06 09:13:23 INFO SecurityManager: Changing modify acls groups to:
21/08/06 09:13:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/08/06 09:13:23 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
21/08/06 09:13:23 INFO SparkEnv: Registering MapOutputTracker
21/08/06 09:13:23 INFO SparkEnv: Registering BlockManagerMaster
21/08/06 09:13:23 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/08/06 09:13:23 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/08/06 09:13:23 INFO DiskBlockManager: Created local directory at /var/data/spark-7f813770-0240-4171-8149-2e472bb9d989/blockmgr-05e37d14-5ef9-4ffc-aff3-948117d3b1ac
21/08/06 09:13:23 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
21/08/06 09:13:23 INFO SparkEnv: Registering OutputCommitCoordinator
21/08/06 09:13:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/08/06 09:13:23 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc:4040
21/08/06 09:13:23 INFO SparkContext: Added JAR local:///opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar at file:/opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar with timestamp 1628241203791
21/08/06 09:13:23 INFO SparkContext: Added JAR local:///opt/geotrellis-backend-assembly-0.4.6-openeo.jar at file:/opt/geotrellis-backend-assembly-0.4.6-openeo.jar with timestamp 1628241203792
21/08/06 09:13:23 WARN SparkContext: File with 'local' scheme is not supported to add to file server, since it is already available on every node.
21/08/06 09:13:25 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
21/08/06 09:13:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
21/08/06 09:13:25 INFO NettyBlockTransferService: Server created on myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc:7079
21/08/06 09:13:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/08/06 09:13:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc, 7079, None)
21/08/06 09:13:25 INFO BlockManagerMasterEndpoint: Registering block manager myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc:7079 with 2004.6 MB RAM, BlockManagerId(driver, myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc, 7079, None)
21/08/06 09:13:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc, 7079, None)
21/08/06 09:13:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc, 7079, None)
21/08/06 09:13:25 INFO CancelRunawayJobListener: initialized with timeout PT15M
21/08/06 09:13:25 INFO SparkContext: Registered listener org.openeo.sparklisteners.CancelRunawayJobListener
21/08/06 09:13:32 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.233.120.52:44494) with ID 1
21/08/06 09:13:32 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
[2021-08-06 09:13:32,186] INFO in openeogeotrellis.service_registry: Creating new InMemoryServiceRegistry: <openeogeotrellis.service_registry.InMemoryServiceRegistry object at 0x7fcee900ce10>
[2021-08-06 09:13:32,187] INFO in openeogeotrellis.layercatalog: Reading layer catalog metadata from /opt/layercatalog.json
[2021-08-06 09:13:32,187] INFO in openeogeotrellis.layercatalog: Updating SENTINEL2_L1C metadata from https://finder.creodias.eu:Sentinel2
[2021-08-06 09:13:32,188] INFO in openeogeotrellis.opensearch: Getting collection metadata from https://finder.creodias.eu/resto/collections.json
21/08/06 09:13:32 INFO BlockManagerMasterEndpoint: Registering block manager 10.233.120.52:42675 with 2.1 GB RAM, BlockManagerId(1, 10.233.120.52, 42675, None)
[2021-08-06 09:13:32,671] INFO in openeogeotrellis.layercatalog: Updating SENTINEL2_L2A metadata from https://finder.creodias.eu:Sentinel2
[2021-08-06 09:13:32,672] INFO in openeogeotrellis.opensearch: Getting collection metadata from https://finder.creodias.eu/resto/collections.json
/usr/local/lib/python3.7/dist-packages/openeo_driver/views.py:208: UserWarning: The name 'openeo' is already registered for this blueprint. Use 'name=' to provide a unique name. This will become an error in Flask 2.1.
app.register_blueprint(bp, url_prefix='/openeo/<version>')
[2021-08-06 09:13:33,254] INFO in openeo_driver.views: App info logging enabled!
[2021-08-06 09:13:33,255] DEBUG in openeo_driver.views: App debug logging enabled!
[2021-08-06 09:13:33,255] INFO in openeo_driver.server: StandaloneApplication options: {'bind': '10.233.120.51:50001', 'workers': 1, 'threads': 10, 'worker_class': 'gthread', 'timeout': 1000, 'loglevel': 'DEBUG', 'accesslog': '-', 'errorlog': '-'}
[2021-08-06 09:13:33,255] INFO in openeo_driver.server: Creating StandaloneApplication
[2021-08-06 09:13:33,257] INFO in openeo_driver.server: Running StandaloneApplication
[2021-08-06 09:13:33 +0000] [48] [DEBUG] Current configuration:
config: ./gunicorn.conf.py
wsgi_app: None
bind: ['10.233.120.51:50001']
backlog: 2048
workers: 1
worker_class: gthread
threads: 10
worker_connections: 1000
max_requests: 0
max_requests_jitter: 0
timeout: 1000
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
print_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /opt/spark/work-dir
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 0
group: 0
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: -
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: -
loglevel: DEBUG
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: gunicorn
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0x7fcf25436620>
on_reload: <function OnReload.on_reload at 0x7fcf25436730>
when_ready: <function run_gunicorn.<locals>.when_ready at 0x7fcee9049e18>
pre_fork: <function Prefork.pre_fork at 0x7fcf25436950>
post_fork: <function Postfork.post_fork at 0x7fcf25436a60>
post_worker_init: <function PostWorkerInit.post_worker_init at 0x7fcf25436b70>
worker_int: <function WorkerInt.worker_int at 0x7fcf25436c80>
worker_abort: <function WorkerAbort.worker_abort at 0x7fcf25436d90>
pre_exec: <function PreExec.pre_exec at 0x7fcf25436ea0>
pre_request: <function PreRequest.pre_request at 0x7fcf253ce048>
post_request: <function PostRequest.post_request at 0x7fcf253ce0d0>
child_exit: <function ChildExit.child_exit at 0x7fcf253ce1e0>
worker_exit: <function WorkerExit.worker_exit at 0x7fcf253ce2f0>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7fcf253ce400>
on_exit: <function OnExit.on_exit at 0x7fcf253ce510>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2021-08-06 09:13:33 +0000] [48] [INFO] Starting gunicorn 20.1.0
[2021-08-06 09:13:33 +0000] [48] [DEBUG] Arbiter booted
[2021-08-06 09:13:33 +0000] [48] [INFO] Listening at: http://10.233.120.51:50001 (48)
[2021-08-06 09:13:33 +0000] [48] [INFO] Using worker: gthread
[2021-08-06 09:13:33,266] INFO in openeo_driver.server: when_ready: <gunicorn.arbiter.Arbiter object at 0x7fcee87360b8>
[2021-08-06 09:13:33 +0000] [48] [INFO] Gunicorn info logging enabled!
[2021-08-06 09:13:33,266] INFO in flask: Flask info logging enabled!
[2021-08-06 09:13:33,266] INFO in openeogeotrellis.deploy: Trying to load 'custom_processes' with PYTHONPATH ['/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy', '/var/data/spark-7f813770-0240-4171-8149-2e472bb9d989/spark-275d7baf-5ae5-416d-aab7-b3335518b688/userFiles-db26d671-239b-47ac-a000-fd2dfda9158b', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.7-src.zip', '/opt/spark/jars/spark-core_2.11-2.4.5.jar', '/opt/spark/python/lib/py4j-*.zip', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']
[2021-08-06 09:13:33,267] INFO in openeogeotrellis.deploy: 'custom_processes' not loaded: ModuleNotFoundError("No module named 'custom_processes'").
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 89, in <module>
main()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 84, in main
on_started=on_started
File "/usr/local/lib/python3.7/dist-packages/openeo_driver/server.py", line 119, in run_gunicorn
application.run()
File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.7/dist-packages/gunicorn/arbiter.py", line 198, in run
self.start()
File "/usr/local/lib/python3.7/dist-packages/gunicorn/arbiter.py", line 167, in start
self.cfg.when_ready(self)
File "/usr/local/lib/python3.7/dist-packages/openeo_driver/server.py", line 113, in when_ready
on_started()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 60, in on_started
setup_batch_jobs()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 51, in setup_batch_jobs
with JobRegistry() as job_registry:
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/job_registry.py", line 183, in __enter__
self._zk.start()
File "/usr/local/lib/python3.7/dist-packages/kazoo/client.py", line 635, in start
raise self.handler.timeout_exception("Connection time-out")
kazoo.handlers.threading.KazooTimeoutError: Connection time-out
21/08/06 09:13:53 INFO SparkContext: Invoking stop() from shutdown hook
21/08/06 09:13:53 INFO SparkUI: Stopped Spark web UI at http://myspark-1620e47b1abce7d6-driver-svc.spark-jobs.svc:4040
21/08/06 09:13:53 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
21/08/06 09:13:53 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
21/08/06 09:13:53 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
21/08/06 09:13:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/08/06 09:13:53 INFO MemoryStore: MemoryStore cleared
21/08/06 09:13:53 INFO BlockManager: BlockManager stopped
21/08/06 09:13:53 INFO BlockManagerMaster: BlockManagerMaster stopped
21/08/06 09:13:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/08/06 09:13:53 INFO SparkContext: Successfully stopped SparkContext
21/08/06 09:13:53 INFO ShutdownHookManager: Shutdown hook called
21/08/06 09:13:53 INFO ShutdownHookManager: Deleting directory /tmp/spark-4f3d9b89-3e4b-44f8-bbf9-9a71bd3e859a
21/08/06 09:13:53 INFO ShutdownHookManager: Deleting directory /var/data/spark-7f813770-0240-4171-8149-2e472bb9d989/spark-275d7baf-5ae5-416d-aab7-b3335518b688/pyspark-964de7f9-6a8c-43c9-9e4b-0809e34d5d93
21/08/06 09:13:53 INFO ShutdownHookManager: Deleting directory /var/data/spark-7f813770-0240-4171-8149-2e472bb9d989/spark-275d7baf-5ae5-416d-aab7-b3335518b688
The final error above is similar to the one we get when we run the docker image on our local machine and try to start kube.py manually:
root@ab697e5f3521:/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy# python3
python3 python3-config python3.7 python3.7-config python3.7m python3.7m-config python3m python3m-config
root@ab697e5f3521:/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy# python3.7 kube.py
Adding process 'e' without implementation
Adding process 'pi' without implementation
starting spark context
21/08/06 09:14:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "kube.py", line 89, in <module>
main()
File "kube.py", line 63, in main
app = build_app(backend_implementation=GeoPySparkBackendImplementation())
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/backend.py", line 257, in __init__
else ZooKeeperServiceRegistry()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/service_registry.py", line 121, in __init__
with self._zk_client() as zk:
File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/service_registry.py", line 201, in _zk_client
zk.start()
File "/usr/local/lib/python3.7/dist-packages/kazoo/client.py", line 635, in start
raise self.handler.timeout_exception("Connection time-out")
kazoo.handlers.threading.KazooTimeoutError: Connection time-out
The KazooTimeoutError
shows that it is still trying to connect to zookeeper nodes.
Have you added the TRAVIS=1
environment variable?
The
KazooTimeoutError
shows that it is still trying to connect to zookeeper nodes.Have you added the
TRAVIS=1
environment variable?
Yes, we 've added it in the driver section of our yaml file:
driver:
memory: "4096m"
cpu: 5
envVars:
TRAVIS: "1"
KUBE: "true"
I had a look at the code, it seems that there was still one service depending on zookeeper, that does not have this check in place. The next image build should contain that fix. The other solution is to also start zookeeper on K8S, Thomas could inform on how we do that. (Maybe another helm chart?)
On a side note, I'm about to leave on holiday for 3 weeks myself, but Thomas can still provide assistance.
What's also important here, is that openEO will have to connect to certain datasets eventually. So as a minimum, this data needs to be available in object storage or on shared disk. If on disk, openEO can discover it with some glob patterns. If object storage, a STAC catalog is needed, so openEO can find the data. This last solution is the most future proof.
We already tested with latest image but it fails again, in the same step :/
Indeed, double checked the code, and I made a mistake in that commit, hope it's better on the next build.
Good morning,
Hope you all had a nice weekend, @jdries, thanks for the last effort and release you made saturday, unfortunately we have the same issue:
[root@openeo-cluster-k8s-master-nf-1 sparkapplication]# kubectl -n spark-jobs logs -f myspark-driver
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.7.3'
+ export PYTHON_VERSION=3.7.3
+ PYTHON_VERSION=3.7.3
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.233.120.95 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py
21/08/09 07:33:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Adding process 'e' without implementation
Adding process 'pi' without implementation
starting spark context
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/08/09 07:34:01 INFO SparkContext: Running Spark version 2.4.5
21/08/09 07:34:01 INFO SparkContext: Submitted application: myspark
21/08/09 07:34:01 INFO SecurityManager: Changing view acls to: root
21/08/09 07:34:01 INFO SecurityManager: Changing modify acls to: root
21/08/09 07:34:01 INFO SecurityManager: Changing view acls groups to:
21/08/09 07:34:01 INFO SecurityManager: Changing modify acls groups to:
21/08/09 07:34:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/08/09 07:34:01 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
21/08/09 07:34:01 INFO SparkEnv: Registering MapOutputTracker
21/08/09 07:34:01 INFO SparkEnv: Registering BlockManagerMaster
21/08/09 07:34:01 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/08/09 07:34:01 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/08/09 07:34:01 INFO DiskBlockManager: Created local directory at /var/data/spark-63a2105b-0616-4052-ad19-a2087d88988c/blockmgr-e72e24dc-b1a2-49f2-8523-555d7c40f06c
21/08/09 07:34:01 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
21/08/09 07:34:01 INFO SparkEnv: Registering OutputCommitCoordinator
21/08/09 07:34:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/08/09 07:34:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://myspark-6893917b29d50186-driver-svc.spark-jobs.svc:4040
21/08/09 07:34:02 INFO SparkContext: Added JAR local:///opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar at file:/opt/geotrellis-extensions-2.2.0-SNAPSHOT.jar with timestamp 1628494442037
21/08/09 07:34:02 INFO SparkContext: Added JAR local:///opt/geotrellis-backend-assembly-0.4.6-openeo.jar at file:/opt/geotrellis-backend-assembly-0.4.6-openeo.jar with timestamp 1628494442037
21/08/09 07:34:02 WARN SparkContext: File with 'local' scheme is not supported to add to file server, since it is already available on every node.
21/08/09 07:34:03 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
21/08/09 07:34:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
21/08/09 07:34:03 INFO NettyBlockTransferService: Server created on myspark-6893917b29d50186-driver-svc.spark-jobs.svc:7079
21/08/09 07:34:03 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/08/09 07:34:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, myspark-6893917b29d50186-driver-svc.spark-jobs.svc, 7079, None)
21/08/09 07:34:03 INFO BlockManagerMasterEndpoint: Registering block manager myspark-6893917b29d50186-driver-svc.spark-jobs.svc:7079 with 2004.6 MB RAM, BlockManagerId(driver, myspark-6893917b29d50186-driver-svc.spark-jobs.svc, 7079, None)
21/08/09 07:34:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, myspark-6893917b29d50186-driver-svc.spark-jobs.svc, 7079, None)
21/08/09 07:34:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, myspark-6893917b29d50186-driver-svc.spark-jobs.svc, 7079, None)
21/08/09 07:34:03 INFO CancelRunawayJobListener: initialized with timeout PT15M
21/08/09 07:34:03 INFO SparkContext: Registered listener org.openeo.sparklisteners.CancelRunawayJobListener
21/08/09 07:34:09 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.233.120.96:56670) with ID 1
21/08/09 07:34:09 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
[2021-08-09 07:34:09,363] INFO in openeogeotrellis.service_registry: Creating new InMemoryServiceRegistry: <openeogeotrellis.service_registry.InMemoryServiceRegistry object at 0x7f5e034f7e10>
[2021-08-09 07:34:09,363] INFO in openeogeotrellis.layercatalog: Reading layer catalog metadata from /opt/layercatalog.json
[2021-08-09 07:34:09,364] INFO in openeogeotrellis.layercatalog: Updating SENTINEL2_L1C metadata from https://finder.creodias.eu:Sentinel2
[2021-08-09 07:34:09,364] INFO in openeogeotrellis.opensearch: Getting collection metadata from https://finder.creodias.eu/resto/collections.json
21/08/09 07:34:09 INFO BlockManagerMasterEndpoint: Registering block manager 10.233.120.96:45609 with 2.1 GB RAM, BlockManagerId(1, 10.233.120.96, 45609, None)
[2021-08-09 07:34:09,783] INFO in openeogeotrellis.layercatalog: Updating SENTINEL2_L2A metadata from https://finder.creodias.eu:Sentinel2
[2021-08-09 07:34:09,783] INFO in openeogeotrellis.opensearch: Getting collection metadata from https://finder.creodias.eu/resto/collections.json
/usr/local/lib/python3.7/dist-packages/openeo_driver/views.py:208: UserWarning: The name 'openeo' is already registered for this blueprint. Use 'name=' to provide a unique name. This will become an error in Flask 2.1.
app.register_blueprint(bp, url_prefix='/openeo/<version>')
[2021-08-09 07:34:10,375] INFO in openeo_driver.views: App info logging enabled!
[2021-08-09 07:34:10,375] DEBUG in openeo_driver.views: App debug logging enabled!
[2021-08-09 07:34:10,375] INFO in openeo_driver.server: StandaloneApplication options: {'bind': '10.233.120.95:50001', 'workers': 1, 'threads': 10, 'worker_class': 'gthread', 'timeout': 1000, 'loglevel': 'DEBUG', 'accesslog': '-', 'errorlog': '-'}
[2021-08-09 07:34:10,376] INFO in openeo_driver.server: Creating StandaloneApplication
[2021-08-09 07:34:10,378] INFO in openeo_driver.server: Running StandaloneApplication
[2021-08-09 07:34:10 +0000] [47] [DEBUG] Current configuration:
config: ./gunicorn.conf.py
wsgi_app: None
bind: ['10.233.120.95:50001']
backlog: 2048
workers: 1
worker_class: gthread
threads: 10
worker_connections: 1000
max_requests: 0
max_requests_jitter: 0
timeout: 1000
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
print_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /opt/spark/work-dir
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 0
group: 0
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: -
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: -
loglevel: DEBUG
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: gunicorn
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0x7f5e3f921620>
on_reload: <function OnReload.on_reload at 0x7f5e3f921730>
when_ready: <function run_gunicorn.<locals>.when_ready at 0x7f5e03534e18>
pre_fork: <function Prefork.pre_fork at 0x7f5e3f921950>
post_fork: <function Postfork.post_fork at 0x7f5e3f921a60>
post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f5e3f921b70>
worker_int: <function WorkerInt.worker_int at 0x7f5e3f921c80>
worker_abort: <function WorkerAbort.worker_abort at 0x7f5e3f921d90>
pre_exec: <function PreExec.pre_exec at 0x7f5e3f921ea0>
pre_request: <function PreRequest.pre_request at 0x7f5e3f8b9048>
post_request: <function PostRequest.post_request at 0x7f5e3f8b90d0>
child_exit: <function ChildExit.child_exit at 0x7f5e3f8b91e0>
worker_exit: <function WorkerExit.worker_exit at 0x7f5e3f8b92f0>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f5e3f8b9400>
on_exit: <function OnExit.on_exit at 0x7f5e3f8b9510>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2021-08-09 07:34:10 +0000] [47] [INFO] Starting gunicorn 20.1.0
[2021-08-09 07:34:10 +0000] [47] [DEBUG] Arbiter booted
[2021-08-09 07:34:10 +0000] [47] [INFO] Listening at: http://10.233.120.95:50001 (47)
[2021-08-09 07:34:10 +0000] [47] [INFO] Using worker: gthread
[2021-08-09 07:34:10,388] INFO in openeo_driver.server: when_ready: <gunicorn.arbiter.Arbiter object at 0x7f5e02c200b8>
[2021-08-09 07:34:10 +0000] [47] [INFO] Gunicorn info logging enabled!
[2021-08-09 07:34:10,388] INFO in flask: Flask info logging enabled!
[2021-08-09 07:34:10,388] INFO in openeogeotrellis.deploy: Trying to load 'custom_processes' with PYTHONPATH ['/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy', '/var/data/spark-63a2105b-0616-4052-ad19-a2087d88988c/spark-55f80d6a-1a98-44c5-92dc-012de6f1c7ae/userFiles-d7c80393-a197-428b-9327-ba184d619ecf', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.7-src.zip', '/opt/spark/jars/spark-core_2.11-2.4.5.jar', '/opt/spark/python/lib/py4j-*.zip', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']
[2021-08-09 07:34:10,389] INFO in openeogeotrellis.deploy: 'custom_processes' not loaded: ModuleNotFoundError("No module named 'custom_processes'").
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 89, in <module>
main()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 84, in main
on_started=on_started
File "/usr/local/lib/python3.7/dist-packages/openeo_driver/server.py", line 119, in run_gunicorn
application.run()
File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.7/dist-packages/gunicorn/arbiter.py", line 198, in run
self.start()
File "/usr/local/lib/python3.7/dist-packages/gunicorn/arbiter.py", line 167, in start
self.cfg.when_ready(self)
File "/usr/local/lib/python3.7/dist-packages/openeo_driver/server.py", line 113, in when_ready
on_started()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 60, in on_started
setup_batch_jobs()
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/deploy/kube.py", line 51, in setup_batch_jobs
with JobRegistry() as job_registry:
File "/usr/local/lib/python3.7/dist-packages/openeogeotrellis/job_registry.py", line 183, in __enter__
self._zk.start()
File "/usr/local/lib/python3.7/dist-packages/kazoo/client.py", line 635, in start
raise self.handler.timeout_exception("Connection time-out")
kazoo.handlers.threading.KazooTimeoutError: Connection time-out
21/08/09 07:34:30 INFO SparkContext: Invoking stop() from shutdown hook
21/08/09 07:34:30 INFO SparkUI: Stopped Spark web UI at http://myspark-6893917b29d50186-driver-svc.spark-jobs.svc:4040
21/08/09 07:34:30 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
21/08/09 07:34:30 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
21/08/09 07:34:30 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
21/08/09 07:34:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/08/09 07:34:31 INFO MemoryStore: MemoryStore cleared
21/08/09 07:34:31 INFO BlockManager: BlockManager stopped
21/08/09 07:34:31 INFO BlockManagerMaster: BlockManagerMaster stopped
21/08/09 07:34:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/08/09 07:34:31 INFO SparkContext: Successfully stopped SparkContext
21/08/09 07:34:31 INFO ShutdownHookManager: Shutdown hook called
21/08/09 07:34:31 INFO ShutdownHookManager: Deleting directory /var/data/spark-63a2105b-0616-4052-ad19-a2087d88988c/spark-55f80d6a-1a98-44c5-92dc-012de6f1c7ae
21/08/09 07:34:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-d164ad5f-f25e-4169-98c9-bc8228211600
21/08/09 07:34:31 INFO ShutdownHookManager: Deleting directory /var/data/spark-63a2105b-0616-4052-ad19-a2087d88988c/spark-55f80d6a-1a98-44c5-92dc-012de6f1c7ae/pyspark-419ee304-e4ed-4291-b365-6b15ca39aaa9
It's really weird that it's still trying to access Zookeeper. When I set TRAVIS: "1"
in the driver envVars
, it does skip the connection to Zookeeper.
We currently have a Zookeeper deployed with this chart: https://github.com/bitnami/charts/tree/master/bitnami/zookeeper
The values.yaml
is as follows:
---
global:
storageClass: "default"
replicaCount: 3
This is very minimal. You just need to make sure the storageClass is setup.
Hello, We get something now finally. We had to downgrade kubernetes version from v1.21.3 to v1.20.6. Since we don't have experience with spark we dont't know if it works like it supposed to do, but the driver and executor in namespace spark-jobs stay running and it's possible to see the user interface from service on port 4040.
It was not necessary to use Zookeeper.
Excellent work @tiagofglip, @zbenta @tcassaert and @jdries ❗ Thank you very much for your efforts 🙏 If you could give @Jaapel and @avgils access to the VM, we could test it based on #3
We just nee the ssh public keys for the users that need access. @Jaapel and @avgils, could you guys send them to us at zacarias@lip.pt or tiagofg@lip.pt?
Could we schedule a meeting next week, or the week after, to discuss next steps?
I know @jdries is on holiday until the end of the month. @tcassaert would you feel comfortable to field questions in his stead? It would be nice to be able to put something together in time for the User Forum Kick Off Meeting on 10 Sept.
@zbenta @tiagofglip please let me know if next week suits, then I will set up a doodle to find a good date and time.
cc @mariojmdavid @Jaapel @gena @avgils
P.S. I am on leave next week, but I am not critical to the discussions.
I believe that for us it is fine, we usually have 2 meetings scheduled every week, one on monday and another on tuesday, both after lunch. For the time being, there are no other meetings scheduled for next week. @mariojmdavid, are you still on hollidays next week?
Could we schedule a meeting next week, or the week after, to discuss next steps?
I know @jdries is on holiday until the end of the month. @tcassaert would you feel comfortable to field questions in his stead? It would be nice to be able to put something together in time for the User Forum Kick Off Meeting on 10 Sept.
@zbenta @tiagofglip please let me know if next week suits, then I will set up a doodle to find a good date and time.
cc @mariojmdavid @Jaapel @gena @avgils
P.S. I am on leave next week, but I am not critical to the discussions.
Ok great! @Jaapel could you please schedule a meeting for next week and include: @zbenta @tiagofglip @avgils @tcassaert Could you all please share your email addresses so @Jaapel can schedule the meeting.
One thing I was wondering: ❓ is it worth for us to work with the Google Earth Engine backend to replicate the Aquamonitor functionality (#3) until the OpenEO backend on INCD has been fully implemented (i.e. including processes and data collections, etc) and then just switch to the INCD backend?
We should also include @jopina, since booth me and @tiagofglip are new to the project, @mariojmdavid, will not be available next week.
Ok great! @Jaapel could you please schedule a meeting for next week and include: @zbenta @tiagofglip @avgils @tcassaert Could you all please share your email addresses so @Jaapel can schedule the meeting.
One thing I was wondering: question is it worth for us to work with the Google Earth Engine backend to replicate the Aquamonitor functionality (#3) until the OpenEO backend on INCD has been fully implemented (i.e. including processes and data collections, etc) and then just switch to the INCD backend?
Thanks @zbenta, could you please share yours and @tiagofglip's email addresses?
Use simple spark batch job script:
https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/deploy/submit_batch_job.sh
Vito has docker images available with all dependencies: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/CentOS/Dockerfile.centos8-openeo
https://artifactory.vgt.vito.be/webapp/#/artifacts/browse/simple/General/vito-docker/centos8-openeo/latest
Full K8S deploy info:
https://github.com/Open-EO/openeo-geotrellis-kubernetes