apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug] hive is not allowed to impersonate hive #3777

Closed coofive closed 1 year ago

coofive commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

# use Kyuubi Spark AuthZ Extension
build/mvn clean package -pl :kyuubi-spark-authz_2.12 -DskipTests -Dspark.version=3.3.0 -Dranger.version=2.1.0

I have two cluster: kerberos & ranger cluster (hdfs://ppdcdpha),no kerberos cluster (hdfs://rtha)

  1. in kerberos & ranger cluster, create external table and location redirect no kerberos cluster
 CREATE EXTERNAL TABLE `ods`.`xxx`( 
   `id` bigint,                        
   `dw_cre_date` string)     
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 LOCATION                                           
   'hdfs://rtha/user/hive/warehouse/ods.db/xxx'
 ;
  1. use the kyuubi beeline query (login by ldap)
/opt/kyuubi/bin/beeline -u "" -n hive -p xxx

> select * from ods.rt_ppdai_train_tb_user_subject_log limit 1;

error info is :

Error: Error operating EXECUTE_STATEMENT: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive/hadoopcbd011051.ppdgdsl.com@PPDHDP.COM is not allowed to impersonate hive
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
        at org.apache.hadoop.ipc.Client.call(Client.java:1558)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy32.getDelegationToken(Unknown Source)
        ...
  1. when I used the origin spark-sql, that is ok.
spark-sql> select * from ods.rt_ppdai_train_tb_user_subject_log limit 1;
1724407 2021-07-18 05:43:01.231 2021-07-18 05:43:01.231 1       2021-07-17 18:04:16.0   2021-07-17 18:04:16.0   true    1575483 1106    xxx        0       xxx    xxx   ["dissent"]     ["collection"]  ["normal"]      ["M0"]  normal  62485   after   next exam     71491
Time taken: 28.846 seconds, Fetched 1 row(s)

Affects Version(s)

1.5.2-incubating

Kyuubi Server Log Output

09:15:35.492 INFO org.apache.kyuubi.operation.log.OperationLog: Creating operation log file /opt/apache-kyuubi-1.5.2-incubating-bin/work/server_operation_logs/69bf5f36-7036-4d24-a121-8bd3e42d1df1/f9c7494e-9ef9-4fdb-a76c-66f98554f8f7
09:15:35.492 INFO org.apache.kyuubi.session.KyuubiSessionImpl: [hive:10.9.11.88] SessionHandle [69bf5f36-7036-4d24-a121-8bd3e42d1df1] - Starting to wait the launch engine operation finished
09:15:35.492 INFO org.apache.kyuubi.session.KyuubiSessionImpl: [hive:10.9.11.88] SessionHandle [69bf5f36-7036-4d24-a121-8bd3e42d1df1] - Engine has been launched, elapsed time: 0 s
09:15:35.492 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing hive's query[f9c7494e-9ef9-4fdb-a76c-66f98554f8f7]: INITIALIZED_STATE -> PENDING_STATE, statement: select * from ods.rt_ppdai_train_tb_user_subject_log limit 1
09:15:35.582 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing hive's query[f9c7494e-9ef9-4fdb-a76c-66f98554f8f7]: PENDING_STATE -> RUNNING_STATE, statement: select * from ods.rt_ppdai_train_tb_user_subject_log limit 1
09:15:36.442 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[f9c7494e-9ef9-4fdb-a76c-66f98554f8f7] in ERROR_STATE
09:15:36.442 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing hive's query[f9c7494e-9ef9-4fdb-a76c-66f98554f8f7]: RUNNING_STATE -> ERROR_STATE, statement: select * from ods.rt_ppdai_train_tb_user_subject_log limit 1, time taken: 0.86 seconds
09:15:36.452 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing hive's query[f9c7494e-9ef9-4fdb-a76c-66f98554f8f7]: ERROR_STATE -> CLOSED_STATE, statement: select * from ods.rt_ppdai_train_tb_user_subject_log limit 1
09:15:36.472 INFO org.apache.kyuubi.client.KyuubiSyncThriftClient: TCloseOperationReq(operationHandle:TOperationHandle(operationId:THandleIdentifier(guid:06 BD D3 15 E3 47 42 7C AC 0D 71 74 C0 A0 2D 4C, secret:8F 66 31 FA 20 5F 4B 27 88 68 D6 10 77 C9 5D A9), operationType:EXECUTE_STATEMENT, hasResultSet:true)) succeed on engine side

Kyuubi Engine Log Output

2022-11-08 09:15:36[ERROR][169][org.apache.kyuubi.engine.spark.operation.ExecuteStatement.error:74] Error operating EXECUTE_STATEMENT: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive/hadoopcbd011051.ppdgdsl.com@PPDHDP.COM is not allowed to impersonate hive
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
    at org.apache.hadoop.ipc.Client.call(Client.java:1558)
    at org.apache.hadoop.ipc.Client.call(Client.java:1455)
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
    at com.sun.proxy.$Proxy32.getDelegationToken(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
    at com.sun.proxy.$Proxy33.getDelegationToken(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
    at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
    at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
    at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3868)
    at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:3120)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3858)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3856)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856)
    at org.apache.spark.sql.Dataset.collect(Dataset.scala:3120)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:90)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:88)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:74)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:106)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
org.apache.hadoop.ipc.RemoteException: User: hive/hadoopcbd011051.ppdgdsl.com@PPDHDP.COM is not allowed to impersonate hive
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.ipc.Client.call(Client.java:1558) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.ipc.Client.call(Client.java:1455) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) ~[hadoop-client-api-3.3.2.jar:?]
    at com.sun.proxy.$Proxy32.getDelegationToken(Unknown Source) ~[?:?]
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) ~[hadoop-client-api-3.3.2.jar:?]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_232]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_232]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_232]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[hadoop-client-api-3.3.2.jar:?]
    at com.sun.proxy.$Proxy33.getDelegationToken(Unknown Source) ~[?:?]
    at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332) ~[hadoop-client-api-3.3.2.jar:?]
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3868) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:3120) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3858) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3856) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.spark.sql.Dataset.collect(Dataset.scala:3120) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:90) ~[kyuubi-spark-sql-engine_2.12-1.5.2-incubating.jar:1.5.2-incubating]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
    at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:88) ~[kyuubi-spark-sql-engine_2.12-1.5.2-incubating.jar:1.5.2-incubating]
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:74) ~[kyuubi-spark-sql-engine_2.12-1.5.2-incubating.jar:1.5.2-incubating]
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:106) ~[kyuubi-spark-sql-engine_2.12-1.5.2-incubating.jar:1.5.2-incubating]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_232]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_232]

2022-11-08 09:15:36[INFO ][169][org.apache.kyuubi.engine.spark.operation.ExecuteStatement.info:56] Processing hive's query[06bdd315-e347-427c-ac0d-7174c0a02d4c]: RUNNING_STATE -> ERROR_STATE, statement: select * from ods.rt_ppdai_train_tb_user_subject_log limit 1, time taken: 0.833 seconds

Kyuubi Server Configurations

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

## Kyuubi Configurations

#
# kyuubi.authentication           NONE
# kyuubi.frontend.bind.host       localhost
# kyuubi.frontend.bind.port       10009
#

#kyuubi.authentication                  LDAP
#kyuubi.authentication.ldap.base.dn     ou=People,dc=testppd,dc=com
#kyuubi.authentication.ldap.url         ldap://10.114.16.50:389 

#kyuubi.engine.share.level              CONNECTION

kyuubi.authentication                  LDAP,KERBEROS
kyuubi.authentication.ldap.base.dn     ou=People,dc=ppdhdp,dc=com
kyuubi.authentication.ldap.url         ldap://hadoopcbd011073.ppdgdsl.com:389

#kyuubi.authentication=KERBEROS
kyuubi.kinit.principal=hive/_HOST@PPDHDP.COM
kyuubi.kinit.keytab=/opt/apache-kyuubi-1.5.2-incubating-bin/conf/hive.keytab

kyuubi.engine.share.level=SERVER

kyuubi.frontend.thrift.login.timeout=300000
kyuubi.session.engine.idle.timeout=600000
kyuubi.ha.zookeeper.connection.timeout=300000
kyuubi.ha.zookeeper.session.timeout=300000
kyuubi.session.engine.request.timeout=300000
kyuubi.session.engine.initialize.timeout=300000

spark.yarn.queue=root.root
kyuubi.ha.zookeeper.quorum=hadoopcbd011043.ppdgdsl.com:2181,hadoopcbd011073.ppdgdsl.com:2181,hadoopcbd011103.ppdgdsl.com:2181,hadoopcbd011118.ppdgdsl.com:2181,hadoopcbd011148.ppdgdsl.com:2181

#kyuubi.zookeeper.embedded.client.port  2081

#___etl___.spark.yarn.queue=root.etl
___hive___.kyuubi.engine.share.level=USER
___hive___.spark.dynamicAllocation.maxExecutors=360

___etl___.kyuubi.engine.share.level=USER
___etl___.spark.dynamicAllocation.maxExecutors=360
___etl___.spark.kryoserializer.buffer.max=512m

#___hive___.spark.yarn.queue=root.etl
# Details in https://kyuubi.apache.org/docs/latest/deployment/settings.html

kyuubi.metrics.enabled=true
kyuubi.metrics.reporters=PROMETHEUS

Kyuubi Engine Configurations

spark.authenticate=false
spark.driver.log.dfsDir=/user/spark/driverLogs3.2.1
spark.driver.log.persistToDfs.enabled=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.io.encryption.enabled=false
spark.network.crypto.enabled=false
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.dynamicAllocation.shuffleTracking.enabled=true
# spark.shuffle.service.enabled=true
# spark.shuffle.service.port=7337
spark.ui.enabled=true
spark.ui.killEnabled=true
spark.lineage.log.dir=/var/log/spark/lineage
spark.lineage.enabled=true
spark.sql.broadcastTimeout=900
spark.network.timeout=120000
spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:OnOutOfMemoryError='kill -9 %p' -Dfile.encoding=utf-8
spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps  -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:OnOutOfMemoryError='kill -9 %p' -Dfile.encoding=utf-8
spark.sql.hive.caseSensitiveInferenceMode=NEVER_INFER
spark.dynamicAllocation.maxExecutors=200
spark.sql.decimalOperations.allowPrecisionLoss=false
spark.executor.cores=4
spark.executor.memory=16g
spark.executor.memoryOverhead=4g
spark.driver.cores=4
spark.driver.memory=16g
spark.yarn.am.cores=4
spark.yarn.am.memory=16g
spark.sql.shuffle.partitions=200
spark.sql.hive.convertMetastoreParquet=false
spark.master=yarn
spark.submit.deployMode=client
spark.eventLog.dir=hdfs://ppdcdpha/user/spark/applicationHistory3.2.1
spark.yarn.historyServer.address=http://hadoopcbd011051.ppdgdsl.com:18080
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.yarn.historyServer.allowTracking=true
spark.yarn.appMasterEnv.MKL_NUM_THREADS=1
spark.executorEnv.MKL_NUM_THREADS=1
spark.yarn.appMasterEnv.OPENBLAS_NUM_THREADS=1
spark.executorEnv.OPENBLAS_NUM_THREADS=1
spark.sql.adaptive.enabled=true
# spark.sql.cbo.enabled=true
spark.sql.legacy.timeParserPolicy=LEGACY
spark.sql.storeAssignmentPolicy=LEGACY
# spark.extraListeners=com.cloudera.spark.lineage.NavigatorAppListener
# spark.sql.queryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListener
spark.scheduler.listenerbus.eventqueue.capacity=100000
spark.eventLog.rolling.enabled=true
spark.eventLog.rolling.maxFileSize=200MB
spark.kryoserializer.buffer.max=512m
mapred.input.dir.recursive=true
spark.hive.mapred.supports.subdirectories=true
spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true
spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 year ago

Hello @coofive, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi (Incubating).

yaooqinn commented 1 year ago

maybe you need to enable impersonation for hive of your hadoop cluster

pan3793 commented 1 year ago

Please check core-site.xml in NameNode, to make sure hive is granted for impersonation.

HwiLu commented 1 year ago

config core-site.xml like below to enable hdfs user proxy ,and hdfs dfsadmin -refreshSuperUserGroupsConfiguration

    <property>
        <name>hadoop.proxyuser.hive.hosts</name>
        <value>*</value>
    </property>

    <property>
        <name>hadoop.proxyuser.hive.groups</name>
        <value>*</value>
    </property>

    <property>
        <name>hadoop.proxyuser.hive.users</name>
        <value>*</value>
    </property>
xza-m commented 1 year ago

can we close user proxy?

coofive commented 1 year ago

yes,the core-site.xml has this config,and I tested with hive beeline and spark-sql,that is ok

<property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.HTTP.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.knox.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.knox.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.livy.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.livy.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.impala.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.impala.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hdfs.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.phoenix.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.phoenix.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.kudu.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.kudu.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.security.group.mapping</name>
    <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
  </property>
  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>false</value>
  </property>
  <property>
    <name>net.topology.script.file.name</name>
    <value>/etc/hadoop/conf.cloudera.yarn/topology.py</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>hadoop.ssl.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hadoop.ssl.require.client.cert</name>
    <value>false</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.keystores.factory.class</name>
    <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.server.conf</name>
    <value>ssl-server.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.ssl.client.conf</name>
    <value>ssl-client.xml</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.http.cross-origin.allowed-methods</name>
    <value>GET, PUT, POST, OPTIONS, HEAD,  DELETE</value>
  </property>
  <property>
    <name>hadoop.http.cross-origin.allowed-headers</name>
    <value>X-Requested-With, Content-Type, Accept, Origin, WWW-Authenticate, Accept-Encoding, Transfer-Encoding</value>
  </property>
  <property>
    <name>hadoop.http.cross-origin.max-age</name>
    <value>180</value>
  </property>
  <property>
    <name>ipc.client.fallback-to-simple-auth-allowed</name>
    <value>true</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.users</name>
    <value>*</value>
  </property>
coofive commented 1 year ago

can we close user proxy?

em... we used the ranger,cannot close the proxy user

pan3793 commented 1 year ago

have you tested spark-sql --proxy-user hive?

coofive commented 1 year ago

have you tested spark-sql --proxy-user hive?

oh, tested spark-sql --proxy-user hive have the samed problem, no proxy is ok.

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive@PPDHDP.COM is not allowed to impersonate hive
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
        at org.apache.hadoop.ipc.Client.call(Client.java:1558)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy29.getDelegationToken(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy30.getDelegationToken(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
        at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
        at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476)
        at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459)
        at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
        at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451)
        at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165)
        at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:163)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
pan3793 commented 1 year ago

Could you make sure the following configurations are set on EACH SERVICE? Your stacktrace indicates that NameNode think hive is not granted for impersonation.

  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.users</name>
    <value>*</value>
  </property>
pan3793 commented 1 year ago

You can use hive user to run HADOOP_PROXY_USER=hive hadoop fs -ls hdfs://<namenode>/xxxx and see what happens

coofive commented 1 year ago

You can use hive user to run HADOOP_PROXY_USER=hive hadoop fs -ls hdfs://<namenode>/xxxx and see what happens

oh,

$HADOOP_PROXY_USER=hive hadoop fs -ls hdfs://rtha/user/hive/warehouse
ls: User: hive@PPDHDP.COM is not allowed to impersonate hive
coofive commented 1 year ago

have you tested spark-sql --proxy-user hive?

how can I close kyuubi's --proxy-user ?

coofive commented 1 year ago

You can use hive user to run HADOOP_PROXY_USER=hive hadoop fs -ls hdfs://<namenode>/xxxx and see what happens

oh,

$HADOOP_PROXY_USER=hive hadoop fs -ls hdfs://rtha/user/hive/warehouse
ls: User: hive@PPDHDP.COM is not allowed to impersonate hive

when I destroy, It works.

$kdestroy

$HADOOP_PROXY_USER=hive hadoop fs -ls hdfs://rtha/user/hive/warehouse
drwxrwxrwx   - hive   hive          0 2022-01-23 12:09 hdfs://ppdhdpha/user/hive/warehouse/mdl_data.db
drwxrwxrwx   - hive   hive          0 2022-11-09 10:45 hdfs://ppdhdpha/user/hive/warehouse/mdzz.db
pan3793 commented 1 year ago

have you tested spark-sql --proxy-user hive?

how can I close kyuubi's --proxy-user ?

Iif spark.kerberos.principal and spark.kerberos.keytab is set, and the spark.kerberos.principal must match your frontend (e.g. LDAP) user.

Kyuubi does not allow to simply disable --proxy-user, it's may cause security issue.

coofive commented 1 year ago

must match your LDAP user

I'd already integration with Kyuubi Spark AuthZ Extension,and login with:

/opt/kyuubi/bin/beeline -u "jdbc:hive2://hadoopcbd012103.ppdgdsl.com:10009" -n hive -p xxx

don't set spark.kerberos.principal and spark.kerberos.keytab, Is this can disable --proxy-user ?

pan3793 commented 1 year ago

Yes, you can check the logic in SparkProcessBuilder#commands

coofive commented 1 year ago

Yes, you can check the logic in SparkProcessBuilder#commands

ok,I'd disable --proxy-user,and It works,and It works with Kyuubi Spark AuthZ Extension.