apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

K8S Spark Init Container does not work with Secure HDFS #619

Closed rvesse closed 5 years ago

rvesse commented 6 years ago

When trying to run a job that requires the use of the --files flag to pre-load files into the container it seems that the init container does not include the Kerberos login logic which results in failure to download the dependencies thus failing the entire job.

Looking at the PR that added Secure HDFS support (#540) I don't see any sign that the init container logic was modified so it appears that this was not included.

Submission Line

spark-submit --deploy-mode cluster --master k8s://https://192.168.0.7:6443 --kubernetes-namespace rvesse --conf spark.executor.instances=5 --conf spark.app.name=spark-test --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.kerberos.principal=rvesse@local --conf spark.kubernetes.kerberos.keytab=/security/secrets/rvesse.keytab --conf spark.kubernetes.kerberos.enabled=true --files hdfs://192.168.0.1:8020/user/rvesse/test2.py local:///var/spark-data/spark-files/test2.py

test2.py is just a toy Spark job, the contents are irrelevant here because the job fails before they are ever consumed but I would note that the same job runs fine on an unsecured HDFS cluster.

Resulting Logs

Job eventually fails, kubectl describe pods shows that the init container failed, and the following are the logs from that container:

kubectl logs spark-test-1519903287101-driver -c spark-init
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ /sbin/tini -s -- /opt/spark/bin/spark-class org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer /etc/spark-init/spark-init.properties
2018-03-01 11:21:32 INFO  KubernetesSparkDependencyDownloadInitContainer:54 - Starting init-container to download Spark application dependencies.
2018-03-01 11:21:33 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing view acls to: root
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing modify acls to: root
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-03-01 11:21:33 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing view acls to: root
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing modify acls to: root
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-03-01 11:21:33 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-03-01 11:21:33 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$waitForFutures$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:187)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$waitForFutures$1.apply(KubernetesSparkDependencyDownloadInitContainer.scala:187)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.waitForFutures(KubernetesSparkDependencyDownloadInitContainer.scala:186)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.run(KubernetesSparkDependencyDownloadInitContainer.scala:140)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$.main(KubernetesSparkDependencyDownloadInitContainer.scala:222)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.main(KubernetesSparkDependencyDownloadInitContainer.scala)
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2110)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
    at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1452)
    at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:707)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:685)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:480)
    at org.apache.spark.deploy.rest.k8s.FileFetcherImpl.fetchFile(KubernetesSparkDependencyDownloadInitContainer.scala:195)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadFiles$4.apply(KubernetesSparkDependencyDownloadInitContainer.scala:181)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadFiles$4.apply(KubernetesSparkDependencyDownloadInitContainer.scala:180)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer.org$apache$spark$deploy$rest$k8s$KubernetesSparkDependencyDownloadInitContainer$$downloadFiles(KubernetesSparkDependencyDownloadInitContainer.scala:180)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$4.apply$mcV$sp(KubernetesSparkDependencyDownloadInitContainer.scala:135)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$4.apply(KubernetesSparkDependencyDownloadInitContainer.scala:135)
    at org.apache.spark.deploy.rest.k8s.KubernetesSparkDependencyDownloadInitContainer$$anonfun$4.apply(KubernetesSparkDependencyDownloadInitContainer.scala:135)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
    ... 21 more

So it looks like the init container isn't recognising that it should be useful Kerberos login for HDFS

liyinan926 commented 6 years ago

Ah, yes, the Kerberos support was not added for the init-container. @ifilonenko .

ifilonenko commented 6 years ago

The secret with the appropriate Kerberos is mounted after the init-container is launched. As such it would require for you to pre-populate the secret with your job users’ delegation token for the init container to see it. In our upstreaming process we are removing the init-container and launching spark-submit from the driver. As such, the init-container will soon be deprecated.

rvesse commented 6 years ago

@ifilonenko In which branch/repo is that? We need to get Kerberos support usable for our customers ASAP so we are happy to use a cutting edge branch if necessary

ifilonenko commented 6 years ago

Kerberos should be enabled on branch-2.2-kubernetes as I have tested this myself. It just doesn’t support interaction from the init-container as that wasn’t a use case that we thought was necessary at that point in time.

liyinan926 commented 6 years ago

@ifilonenko I think we should mount the same secret that stores the delegation token into the init-container. This is the case for general secrets: we always mount each user-specified secret into both the init-container and main container.

rvesse commented 6 years ago

Working on a fix for this internally, will post a PR once I have validated the fix

ifilonenko commented 6 years ago

thanks @rvesse :) I will review it.