Open avishnus opened 1 month ago
Hello @avishnus, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.
Do you configure Hadoop properly? Specifically, core-site.xml
, hdfs-site.xml
.
What's your exact Spark version?
Do you configure Hadoop properly? Specifically,
core-site.xml
,hdfs-site.xml
. What's your exact Spark version?
Yes I have core-site.xml, hdfs-sile.xml . Spark version is 3.2
I also see these in the pod logs
05:31:59.938 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:avishnus (auth:PROXY) via kyuubi (auth:SIMPLE) cause:org.apache.spark.SparkException: Uploading file /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.2-SNAPSHOT.jar failed...
See more: /opt/kyuubi/work/avishnus/kyuubi-spark-sql-engine.log.0
Here why is it trying to upload via kyuubi user and not my id
Assume you have set up Hadoop configuration files properly, it's likely that HADOOP_CONF_DIR
is not set properly in your Kyuubi Server Pod.
Before using Kyuubi to launch Spark engine, please try to use vanilla spark-submit
to submit a Spark Pi on your Kyuubi Server Pod first to make sure Spark is configured properly.
Spark version is 3.2
As you didn't provide the exact Spark version, SPARK-42785 (fixed in Spark 3.2.4) may affect your use case too.
This is my spark submit from the pod logs
/opt/spark-3.2.0.3.2.2.kyuubi_test-1-bin-hadoop3.2/bin/spark-submit \
--class org.apache.kyuubi.engine.spark.SparkSQLEngine \
--conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \
--conf spark.kyuubi.client.ipAddress=127.0.0.1 \
--conf spark.kyuubi.engine.credentials= \
--conf spark.kyuubi.engine.engineLog.path=/opt/kyuubi/work/hdfs/kyuubi-spark-sql-engine.log.0 \
--conf spark.kyuubi.engine.submit.time=1716187852979 \
--conf spark.kyuubi.ha.addresses=x.x.x.x:2181 \
--conf spark.kyuubi.ha.engine.ref.id= \
--conf spark.kyuubi.ha.namespace=/kyuubi_1.8.2-SNAPSHOT_USER_SPARK_SQL/hdfs/default \
--conf spark.kyuubi.ha.zookeeper.auth.type=NONE \
--conf spark.kyuubi.kubernetes.master.address= \
--conf spark.kyuubi.kubernetes.namespace=scaas \
--conf spark.kyuubi.server.ipAddress=127.0.0.1 \
--conf spark.kyuubi.session.connection.url=localhost:10009 \
--conf spark.kyuubi.session.real.user=hdfs \
--conf spark.kyuubi.zookeeper.embedded.client.port=2181 \
--conf spark.app.name=kyuubi_USER_SPARK_SQL_hdfs_default_26933464-8cbc-4202-aa45-75f5a4c91bff \
--conf spark.driver.extraJavaOptions=-Divy.home=/tmp \
--conf spark.driver.port=7078 \
--conf spark.hadoop.scaas.skipDeleteOnTerminationValidation=true \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=scaas/apache-spark:v3.2.0_hadoop3.2 \
--conf spark.kubernetes.driver.label.kyuubi-unique-tag=26933464-8cbc-4202-aa45-75f5a4c91bff \
--conf spark.kubernetes.driver.pod.name=kyuubi-user-spark-sql-hdfs-default-26933464-8cbc-4202-aa45-75f5a4c91bff-driver \
--conf spark.kubernetes.driver.podTemplateFile=/opt/spark/spark-driver-pod-template.yaml \
--conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.kubernetes.executor.podNamePrefix=kyuubi-user-spark-sql-hdfs-default-26933464-8cbc-4202-aa45-75f5a4c91bff \
--conf spark.kubernetes.executor.podTemplateFile=/opt/spark/spark-executor-pod-template.yaml \
--conf spark.kubernetes.file.upload.path=hdfs://tenent:8020/tmp \
--conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
--conf spark.kubernetes.namespace=scaas \
--conf spark.master=k8s://https://api.stage.pharos.visa.com \
--conf spark.rpc.askTimeout=300 \
--conf spark.submit.deployMode=cluster \
--conf spark.kubernetes.driverEnv.SPARK_USER_NAME=hdfs \
--conf spark.executorEnv.SPARK_USER_NAME=hdfs \
--proxy-user hdfs /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.2-SNAPSHOT.jar
Hadoop Security is always a complex topic, let me clarify how it works in the Kyuubi system briefly.
Assume you have basic knowledge of Kerberos, Hadoop User Impersonation(Proxy User), and Hadoop Delegation Token (DT).
The basic pipeline of Kyuubi is:
Client => Kyuubi Server => Spark Driver
The first part Client => Kyuubi Server supports several authentication methods including Kerberos, LDAP, etc., it is responsible for ensuring the legitimacy of the connected user and providing a trusted username(session user) to the next system.
Then Kyuubi server uses the session user to find or launch a proper Spark Driver. Assume there are no existing Spark Drivers, Kyuubi Server assembles a spark-submit
command and runs it in a sub-process to launch a Spark Driver with --proxy-user <session user>
.
For Kerberized environments, there are typically two ways to launch a Spark application.
spark-submit --proxy-user <session user>
to generate and distribute DTsspark-submit --pricipal <session user> --keytab </path/of/session-user.keytab>
The principle here is: we must NOT distribute superuser's keytab to Spark app's local cache due to security concerns, but it's safe to distribute session user's keytab and transient DTs.
For case 1(that's your case), we don't need to maintain all session users' keytabs, so it's the Kyuubi default approach for Kerberos case. While it requries that someone run kinit
periodically to refresh the TGT cache. Kyuubi Server takes care of that if core-site.xml
is configured properly (hadoop.security.authentication=KERBEROS
), you can check Kyuubi Server logs to see if it's working, or run klist
(must use the OS user which runs the Kyuubi Server process) inside Kyuubi Server Pod to check if TGT cache is available.
core-site.xml
should also be visualable to spark-submit
, so that the spark-submit
knows it should request DTs and distribute them to the Spark Driver Pod. You can run kubectl desctibe pod <spark-drvier-pod>
to check if there is a secret named *-delegation-tokens
mounted to the pod, and an env HADOOP_TOKEN_FILE_LOCATION
point to /mnt/secrets/hadoop-credentials/hadoop-tokens
.
Then Spark Driver could pick up the DTs(via HADOOP_TOKEN_FILE_LOCATION
) and use them to access HDFS, HMS etc.
I tried kinit using superuser and then perform spark-submit. yet I got the same error. So the core-site.xml has to be present in both kyuubi and spark conf?
I believe I have answered the hadoop conf question.
Assume you have set up Hadoop configuration files properly, it's likely that HADOOP_CONF_DIR is not set properly in your Kyuubi Server Pod.
Code of Conduct
Search before asking
Describe the bug
Error: org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Affects Version(s)
1.8.2
Kyuubi Server Log Output
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?