JahstreetOrg / spark-on-kubernetes-docker

Spark on Kubernetes infrastructure Docker images repo
Apache License 2.0
37 stars 44 forks source link

Do 187/spark 3.1.1 #17

Closed jeromebanks closed 2 years ago

jeromebanks commented 3 years ago

Add a build for Spark 3.1.1

jahstreet commented 3 years ago

Thx @jeromebanks for the contribution. Just to be clear with the expectations, I do not have enough time to constantly maintain this project at the moment but will try to find the time to go over the existing PRs and issues and make them the part of the master. Best.

xiaomao23zhi commented 2 years ago

I have tried to deploy with this PR, but could not create livy session successfully. It seems livy tied to upload /opt/spark/python/lib/pyspark.zip /opt/spark/python/lib/py4j-0.10.9-src.zip to spark.kubernetes.file.upload.path. Tried to add these 2 files into LIVY_LIVY_RSC_JARS, but still got the same error.

WARNING: All illegal access operations will be denied in a future release 22/01/19 04:06:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22/01/19 04:06:28 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 22/01/19 04:06:29 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image. Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property. at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299) at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108)

Kubernetes: v1.18.8 Images:

xiaomao23zhi commented 2 years ago

I have tried to deploy with this PR, but could not create livy session successfully. It seems livy tied to upload /opt/spark/python/lib/pyspark.zip /opt/spark/python/lib/py4j-0.10.9-src.zip to spark.kubernetes.file.upload.path. Tried to add these 2 files into LIVY_LIVY_RSC_JARS, but still got the same error.

WARNING: All illegal access operations will be denied in a future release 22/01/19 04:06:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22/01/19 04:06:28 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 22/01/19 04:06:29 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image. Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property. at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299) at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108)

Kubernetes: v1.18.8 Images:

  • jeromebanks/livy:0.8.0-incubating-spark_3.1.1_2.12-hadoop_3.2.0_cloud
  • jeromebanks/livy-spark:0.8.0-incubating-spark_3.1.1_2.12-hadoop_3.2.0_cloud

It seems since spark 3.1.1, user resources can be uploaded when creating python container, as this PR: https://github.com/apache/spark/pull/25870.

There are 2 ways to resolve this:

  1. set LIVY_SPARK_KUBERNETES_FILE_UPLOAD_PATH
  2. Patch this PR:https://github.com/apache/incubator-livy/pull/281, rebuild livy-spark and livy images, then set LIVY_SPARK_SUBMIT_PY1FILES: {value: "local:///opt/spark/python/lib/pyspark.zip,local:///opt/spark/python/lib/py4j-0.10.9-src.zip"}, these pyspark archives wouldn't be uploaded.

Thanks for this PR.