apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.01k stars 880 forks source link

[Bug] : In K8s, Spark Session launch failing with error #6393

Open avishnus opened 1 month ago

avishnus commented 1 month ago

Code of Conduct

Search before asking

Describe the bug

Error: org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

Affects Version(s)

1.8.2

Kyuubi Server Log Output

2024-05-20 09:35:10.526 ERROR KyuubiTBinaryFrontendHandler-Pool: Thread-57 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Error getting info:
java.util.concurrent.ExecutionException: org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1604)
    at org.apache.hadoop.ipc.Client.call(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1447)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
    at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
    at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1671)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1603)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1600)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1615)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1690)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:325)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:276)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
    at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:174)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:165)
    at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$4(KubernetesDriverBuilder.scala:65)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:63)
    at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:107)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:223)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:217)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2742)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:217)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:189)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:984)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:172)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:170)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:211)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1072)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1081)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
09:35:09.404 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:avishnus (auth:PROXY) via kyuubi (auth:SIMPLE) cause:org.apache.spark.SparkException: Uploading file /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.2-SNAPSHOT.jar failed...
 See more: /opt/kyuubi/work/avishnus/kyuubi-spark-sql-engine.log.0
    at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
    at org.apache.kyuubi.engine.ProcBuilder.$anonfun$start$1(ProcBuilder.scala:232)
    at java.lang.Thread.run(Thread.java:750)
.
FYI: The last 10 line(s) of log are:
09:35:09.393 [IPC Client (371440613) connection to sl73dpihmnu0108.visa.com/10.207.184.24:8020 from avishnus] DEBUG org.apache.hadoop.ipc.Client - IPC Client (371440613) connection to sl73dpihmnu0108.visa.com/10.207.184.24:8020 from avishnus: stopped, remaining connections 0
09:35:09.399 [main] DEBUG org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking call #0 ClientNamenodeProtocolTranslatorPB.getFileInfo over sl73dpihmnu0108.visa.com/10.207.184.24:8020. Not retrying because try once and fail.
09:35:09.414 [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory /tmp/spark-df9998e8-20b6-45b9-9c3d-a5c5f5444b6e
09:35:09.424 [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory /tmp/spark-25cd417f-3c44-4890-9dca-fbfa816d0ca6
09:35:09.433 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - stopping client from cache: Client-539fe933aad14d059e90457605f9693d
09:35:09.433 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - removing client from cache: Client-539fe933aad14d059e90457605f9693d
09:35:09.434 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - stopping actual client because no more references remain: Client-539fe933aad14d059e90457605f9693d
09:35:09.434 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - Stopping client
09:35:09.436 [Thread-2] DEBUG org.apache.hadoop.util.ShutdownHookManager - Completed shutdown in 0.026 seconds; Timeouts: 0
09:35:09.448 [Thread-2] DEBUG org.apache.hadoop.util.ShutdownHookManager - ShutdownHookManger completed shutdown.
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$waitForEngineLaunched$1(KyuubiSessionImpl.scala:242)
    at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$waitForEngineLaunched$1$adapted(KyuubiSessionImpl.scala:238)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.kyuubi.session.KyuubiSessionImpl.waitForEngineLaunched(KyuubiSessionImpl.scala:238)
    at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$getInfo$1(KyuubiSessionImpl.scala:285)
    at org.apache.kyuubi.session.AbstractSession.withAcquireRelease(AbstractSession.scala:82)
    at org.apache.kyuubi.session.KyuubiSessionImpl.getInfo(KyuubiSessionImpl.scala:284)
    at org.apache.kyuubi.service.AbstractBackendService.getInfo(AbstractBackendService.scala:54)
    at org.apache.kyuubi.server.KyuubiServer$$anon$1.org$apache$kyuubi$server$BackendServiceMetric$$super$getInfo(KyuubiServer.scala:171)
    at org.apache.kyuubi.server.BackendServiceMetric.$anonfun$getInfo$1(BackendServiceMetric.scala:51)
    at org.apache.kyuubi.metrics.MetricsSystem$.timerTracing(MetricsSystem.scala:112)
    at org.apache.kyuubi.server.BackendServiceMetric.getInfo(BackendServiceMetric.scala:51)
    at org.apache.kyuubi.server.BackendServiceMetric.getInfo$(BackendServiceMetric.scala:47)
    at org.apache.kyuubi.server.KyuubiServer$$anon$1.getInfo(KyuubiServer.scala:171)
    at org.apache.kyuubi.service.TFrontendService.GetInfo(TFrontendService.scala:226)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetInfo.getResult(TCLIService.java:1537)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetInfo.getResult(TCLIService.java:1522)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.kyuubi.service.authentication.HadoopThriftAuthBridgeServer$TUGIAssumingProcessor.process(HadoopThriftAuthBridgeServer.scala:163)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1604)
    at org.apache.hadoop.ipc.Client.call(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1447)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
    at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
    at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1671)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1603)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1600)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1615)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1690)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:325)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:276)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
    at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:174)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:165)
    at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$4(KubernetesDriverBuilder.scala:65)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:63)
    at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:107)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:223)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:217)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2742)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:217)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:189)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:984)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:172)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:170)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:211)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1072)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1081)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
09:35:09.404 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:avishnus (auth:PROXY) via kyuubi (auth:SIMPLE) cause:org.apache.spark.SparkException: Uploading file /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.2-SNAPSHOT.jar failed...

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
kyuubi.authentication=KERBEROS
kyuubi.frontend.bind.host=localhost
kyuubi.frontend.bind.port=10009
kyuubi.kinit.principal=hive/xxxx@org
kyuubi.kinit.keytab=hive.keytab
kyuubi.zookeeper.embedded.client.port=2181
spark.driver.port=7078
spark.kubernetes.namespace=default
kyuubi.kubernetes.master.address=k8s://https://abc
spark.master=k8s://https://abc
kyuubi.kubernetes.namespace=default
spark.submit.deployMode=cluster
spark.kubernetes.authenticate.driver.serviceAccountName=spark
spark.kubernetes.container.image=xxxx
spark.driver.extraJavaOptions=-Divy.home=/tmp
spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf
spark.kubernetes.executor.deleteOnTermination=false
spark.hadoop.scaas.skipDeleteOnTerminationValidation=true
spark.kubernetes.file.upload.path=hdfs://xxxx:xxxx/tmp

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 month ago

Hello @avishnus, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

pan3793 commented 1 month ago

Do you configure Hadoop properly? Specifically, core-site.xml, hdfs-site.xml. What's your exact Spark version?

avishnus commented 1 month ago

Do you configure Hadoop properly? Specifically, core-site.xml, hdfs-site.xml. What's your exact Spark version?

Yes I have core-site.xml, hdfs-sile.xml . Spark version is 3.2

avishnus commented 1 month ago

I also see these in the pod logs

05:31:59.938 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:avishnus (auth:PROXY) via kyuubi (auth:SIMPLE) cause:org.apache.spark.SparkException: Uploading file /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.2-SNAPSHOT.jar failed...
 See more: /opt/kyuubi/work/avishnus/kyuubi-spark-sql-engine.log.0

Here why is it trying to upload via kyuubi user and not my id

pan3793 commented 1 month ago

Assume you have set up Hadoop configuration files properly, it's likely that HADOOP_CONF_DIR is not set properly in your Kyuubi Server Pod.

Before using Kyuubi to launch Spark engine, please try to use vanilla spark-submit to submit a Spark Pi on your Kyuubi Server Pod first to make sure Spark is configured properly.

Spark version is 3.2

As you didn't provide the exact Spark version, SPARK-42785 (fixed in Spark 3.2.4) may affect your use case too.

avishnus commented 1 month ago

This is my spark submit from the pod logs

/opt/spark-3.2.0.3.2.2.kyuubi_test-1-bin-hadoop3.2/bin/spark-submit \
    --class org.apache.kyuubi.engine.spark.SparkSQLEngine \
    --conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \
    --conf spark.kyuubi.client.ipAddress=127.0.0.1 \
    --conf spark.kyuubi.engine.credentials= \
    --conf spark.kyuubi.engine.engineLog.path=/opt/kyuubi/work/hdfs/kyuubi-spark-sql-engine.log.0 \
    --conf spark.kyuubi.engine.submit.time=1716187852979 \
    --conf spark.kyuubi.ha.addresses=x.x.x.x:2181 \
    --conf spark.kyuubi.ha.engine.ref.id= \
    --conf spark.kyuubi.ha.namespace=/kyuubi_1.8.2-SNAPSHOT_USER_SPARK_SQL/hdfs/default \
    --conf spark.kyuubi.ha.zookeeper.auth.type=NONE \
    --conf spark.kyuubi.kubernetes.master.address= \
    --conf spark.kyuubi.kubernetes.namespace=scaas \
    --conf spark.kyuubi.server.ipAddress=127.0.0.1 \
    --conf spark.kyuubi.session.connection.url=localhost:10009 \
    --conf spark.kyuubi.session.real.user=hdfs \
    --conf spark.kyuubi.zookeeper.embedded.client.port=2181 \
    --conf spark.app.name=kyuubi_USER_SPARK_SQL_hdfs_default_26933464-8cbc-4202-aa45-75f5a4c91bff \
    --conf spark.driver.extraJavaOptions=-Divy.home=/tmp \
    --conf spark.driver.port=7078 \
    --conf spark.hadoop.scaas.skipDeleteOnTerminationValidation=true \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=scaas/apache-spark:v3.2.0_hadoop3.2 \
    --conf spark.kubernetes.driver.label.kyuubi-unique-tag=26933464-8cbc-4202-aa45-75f5a4c91bff \
    --conf spark.kubernetes.driver.pod.name=kyuubi-user-spark-sql-hdfs-default-26933464-8cbc-4202-aa45-75f5a4c91bff-driver \
    --conf spark.kubernetes.driver.podTemplateFile=/opt/spark/spark-driver-pod-template.yaml \
    --conf spark.kubernetes.executor.deleteOnTermination=false \
    --conf spark.kubernetes.executor.podNamePrefix=kyuubi-user-spark-sql-hdfs-default-26933464-8cbc-4202-aa45-75f5a4c91bff \
    --conf spark.kubernetes.executor.podTemplateFile=/opt/spark/spark-executor-pod-template.yaml \
    --conf spark.kubernetes.file.upload.path=hdfs://tenent:8020/tmp \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
    --conf spark.kubernetes.namespace=scaas \
    --conf spark.master=k8s://https://api.stage.pharos.visa.com \
    --conf spark.rpc.askTimeout=300 \
    --conf spark.submit.deployMode=cluster \
    --conf spark.kubernetes.driverEnv.SPARK_USER_NAME=hdfs \
    --conf spark.executorEnv.SPARK_USER_NAME=hdfs \
    --proxy-user hdfs /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.2-SNAPSHOT.jar
pan3793 commented 1 month ago

Hadoop Security is always a complex topic, let me clarify how it works in the Kyuubi system briefly.

Assume you have basic knowledge of Kerberos, Hadoop User Impersonation(Proxy User), and Hadoop Delegation Token (DT).

The basic pipeline of Kyuubi is:

Client => Kyuubi Server => Spark Driver

The first part Client => Kyuubi Server supports several authentication methods including Kerberos, LDAP, etc., it is responsible for ensuring the legitimacy of the connected user and providing a trusted username(session user) to the next system.

Then Kyuubi server uses the session user to find or launch a proper Spark Driver. Assume there are no existing Spark Drivers, Kyuubi Server assembles a spark-submit command and runs it in a sub-process to launch a Spark Driver with --proxy-user <session user>.

For Kerberized environments, there are typically two ways to launch a Spark application.

  1. run kinit with a superuser's keytab first, to generate TGT cache, then perform spark-submit --proxy-user <session user> to generate and distribute DTs
  2. perform spark-submit --pricipal <session user> --keytab </path/of/session-user.keytab>

The principle here is: we must NOT distribute superuser's keytab to Spark app's local cache due to security concerns, but it's safe to distribute session user's keytab and transient DTs.

For case 1(that's your case), we don't need to maintain all session users' keytabs, so it's the Kyuubi default approach for Kerberos case. While it requries that someone run kinit periodically to refresh the TGT cache. Kyuubi Server takes care of that if core-site.xml is configured properly (hadoop.security.authentication=KERBEROS), you can check Kyuubi Server logs to see if it's working, or run klist(must use the OS user which runs the Kyuubi Server process) inside Kyuubi Server Pod to check if TGT cache is available.

core-site.xml should also be visualable to spark-submit, so that the spark-submit knows it should request DTs and distribute them to the Spark Driver Pod. You can run kubectl desctibe pod <spark-drvier-pod> to check if there is a secret named *-delegation-tokens mounted to the pod, and an env HADOOP_TOKEN_FILE_LOCATION point to /mnt/secrets/hadoop-credentials/hadoop-tokens.

Then Spark Driver could pick up the DTs(via HADOOP_TOKEN_FILE_LOCATION) and use them to access HDFS, HMS etc.

avishnus commented 1 month ago

I tried kinit using superuser and then perform spark-submit. yet I got the same error. So the core-site.xml has to be present in both kyuubi and spark conf?

pan3793 commented 1 month ago

I believe I have answered the hadoop conf question.

Assume you have set up Hadoop configuration files properly, it's likely that HADOOP_CONF_DIR is not set properly in your Kyuubi Server Pod.