apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.08k stars 908 forks source link

[Bug] Kyuubi should not check the hdfs file permission of permanent view #5126

Open AuthurWang2009 opened 1 year ago

AuthurWang2009 commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

1、First, we create a user bdhmgmas, grant database permission of database bs_comdb to it, and then create a table bs_comdb.test1 in database bs_comdb, the ddl of table is like this: use bs_comdb; create table test1(a string) location '/user/bdhmgmas/db/bs_comdb/test1'; insert into bs_comdb.test1 values '1','2','3';

2、Second, we create another user bdhcwbkj, and then grant database permission of database bs_cwbdb to it,and then create a view bs_cwbdb.viw1 which refers to the table bs_comdb.test1, the ddl of view like this: use bs_cwbdb; create view viw1 as select * from bs_comdb.test1;

3、Third, we change the owner of the hdfs file '/user/bdhmgmas/db/bs_comdb/test1' to hdfs:supergroup, and the permission of it to 711 by the following commands:

hadoop fs -chown -R hdfs:supergroup /user/bdhmgmas/db/bs_comdb/test1 hadoop fs -chmod -R 711 /user/bdhmgmas/db/bs_comdb/test1

4、finally, we connect to kyuubi jdbc server, and query the view with user bdhcwbkj by the following commands: kinit -p bdhcwbkj -kt ~/keytab/apps.keytab -c /tmp/bdhcwbkj_ccc export KRB5CCNAME=/tmp/bdhcwbkj_ccc $HOME/bss_home/kyuubi/bin/beeline -u "jdbc:kyuubi://172.21.21.129:10009/default;kyuubiServerPrincipal=hive/_HOST@BG.COM" --hiveconf spark.yarn.queue=root.000kjb.bdhmgmas_bas

select count(*) from bs_cwbdb.viw1;

The query throws an exception: Error: org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Error operating ExecuteStatement: org.apache.hadoop.security.AccessControlException: Permission denied: user=bdhcwbkj, access=READ_EXECUTE, inode="/user/bdhmgmas/db/bs_comdb/test1":hdfs:supergroup:drwx--x--x

6、My env is: spark 3.3.1 kyuubi 1.8.0 hdfs, hive and ranger is in CDP7.1.7sp1 platform, hdfs and hive both enable kerberos and ranger。

Affects Version(s)

master

Kyuubi Server Log Output

2023-08-01 21:12:34.039 INFO org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2023-08-01 21:12:34.051 INFO org.apache.kyuubi.credentials.HadoopCredentialsManager: Scheduling renewal in 0 ms.
2023-08-01 21:12:34.052 INFO org.apache.kyuubi.credentials.HadoopCredentialsManager: Created CredentialsRef for user bdhcwbkj and scheduled a renewal task
2023-08-01 21:12:34.063 INFO org.apache.kyuubi.credentials.HadoopFsDelegationTokenProvider: getting token owned by bdhcwbkj for: hdfs://patch-bak-bas
2023-08-01 21:12:34.290 INFO org.apache.hadoop.hdfs.DFSClient: Created token for bdhcwbkj: HDFS_DELEGATION_TOKEN owner=bdhcwbkj, renewer=bdhcwbkj, realUser=hive/bg21129.hadoop.com@BG.COM, issueDate=1690895552790, maxDate=1691500352790, sequenceNumber=828626, masterKeyId=701 on ha-hdfs:patch-bak-bas
2023-08-01 21:12:34.305 INFO org.apache.kyuubi.credentials.HiveDelegationTokenProvider: Getting Hive delegation token for bdhcwbkj against hive/_HOST@BG.COM
2023-08-01 21:12:34.493 INFO org.apache.kyuubi.credentials.HadoopCredentialsManager: Scheduling renewal in 3600000 ms.
2023-08-01 21:12:34.523 INFO org.apache.kyuubi.Utils: Loading Kyuubi properties from /mnt/disk03/bdhmgmas/CDP/conf/spark-conf/spark-defaults.conf
2023-08-01 21:12:34.538 INFO org.apache.kyuubi.engine.EngineRef: Launching engine:
/mnt/disk03/bdhmgmas/bss_home/spark-3.3.1-bin-xuanwu/bin/spark-submit \
        --class org.apache.kyuubi.engine.spark.SparkSQLEngine \
        --conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \
        --conf spark.kyuubi.client.ipAddress=172.21.21.129 \
        --conf spark.kyuubi.client.version=1.8.0-SNAPSHOT \
        --conf spark.kyuubi.engine.credentials=SERUUwACQXRocmlmdDovL2JnMjExNDYuaGFkb29wLmNvbTo5MDgzLHRocmlmdDovL2JnMjExNTcu
aGFkb29wLmNvbTo5MDgzQAAIYmRoY3dia2oEaGl2ZR5oaXZlL2JnMjExMjkuaGFkb29wLmNvbUBC
Ry5DT02KAYmxOpZTigGJ1UcaUwqOASUUmsdU0jNTO1ShaXho0z7Xntrxe4sVSElWRV9ERUxFR0FU
SU9OX1RPS0VOABVoYS1oZGZzOnBhdGNoLWJhay1iYXNHAAhiZGhjd2JraghiZGhjd2Jrah5oaXZl
L2JnMjExMjkuaGFkb29wLmNvbUBCRy5DT02KAYmxOpEWigGJ1UcVFo0MpNKOAr0U/o+zViBZabeB
0nzmX4ysdjMpRC8VSERGU19ERUxFR0FUSU9OX1RPS0VOFWhhLWhkZnM6cGF0Y2gtYmFrLWJhcwA= \
        --conf spark.kyuubi.engine.jdbc.memory=1g \
        --conf spark.kyuubi.engine.pool.name=kyuubi-pool \
        --conf spark.kyuubi.engine.pool.size=-1 \
        --conf spark.kyuubi.engine.share.level=CONNECTION \
        --conf spark.kyuubi.engine.single.spark.session=false \
        --conf spark.kyuubi.engine.submit.time=1690895554508 \
        --conf spark.kyuubi.engine.type=SPARK_SQL \
        --conf spark.kyuubi.engine.user.isolated.spark.session=true \
        --conf spark.kyuubi.frontend.protocols=THRIFT_BINARY,REST \
        --conf spark.kyuubi.ha.addresses=bg21129.hadoop.com:2181,bg21146.hadoop.com:2181,bg21157.hadoop.com:2181 \
        --conf spark.kyuubi.ha.engine.ref.id=f323c4c8-074c-4fbf-b087-b82c3bd924f2 \
        --conf spark.kyuubi.ha.namespace=/kyuubi_1.8.0-SNAPSHOT_CONNECTION_SPARK_SQL/bdhcwbkj/f323c4c8-074c-4fbf-b087-b82c3bd924f2 \
        --conf spark.kyuubi.ha.zookeeper.acl.enabled=true \
        --conf spark.kyuubi.ha.zookeeper.auth.type=KERBEROS \
        --conf spark.kyuubi.ha.zookeeper.engine.auth.type=KERBEROS \
        --conf spark.kyuubi.metadata.cleaner.enabled=true \
        --conf spark.kyuubi.metadata.cleaner.interval=PT1H \
        --conf spark.kyuubi.server.ipAddress=172.21.21.129 \
        --conf spark.kyuubi.session.connection.url=bg21129.hadoop.com:10009 \
        --conf spark.kyuubi.session.engine.initialize.timeout=PT3M \
        --conf spark.kyuubi.session.real.user=bdhcwbkj \
        --conf spark.app.name=kyuubi_CONNECTION_SPARK_SQL_bdhcwbkj_f323c4c8-074c-4fbf-b087-b82c3bd924f2 \
        --conf spark.yarn.queue=root.000kjb.bdhmgmas_bas \
        --conf spark.yarn.tags=KYUUBI,f323c4c8-074c-4fbf-b087-b82c3bd924f2 \
        --proxy-user bdhcwbkj /mnt/disk03/bdhmgmas/bss_home/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.8.0-SNAPSHOT.jar
2023-08-01 21:12:34.548 INFO org.apache.kyuubi.engine.ProcBuilder: Logging to /mnt/disk03/bdhmgmas/bss_home/kyuubi/work/bdhcwbkj/kyuubi-spark-sql-engine.log.12
2023-08-01 21:13:28.720 INFO org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient: Get service instance:bg21129.hadoop.com:39110 engine id:application_1688367119371_0900 and version:1.8.0-SNAPSHOT under /kyuubi_1.8.0-SNAPSHOT_CONNECTION_SPARK_SQL/bdhcwbkj/f323c4c8-074c-4fbf-b087-b82c3bd924f2
2023-08-01 21:13:29.107 INFO org.apache.kyuubi.session.KyuubiSessionImpl: [bdhcwbkj:172.21.21.129] SessionHandle [f323c4c8-074c-4fbf-b087-b82c3bd924f2] - Connected to engine [bg21129.hadoop.com:39110]/[application_1688367119371_0900] with SessionHandle [f323c4c8-074c-4fbf-b087-b82c3bd924f2]]
2023-08-01 21:13:29.109 INFO org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting
2023-08-01 21:13:29.117 INFO org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Session: 0x213e3e17f73ae09 closed
2023-08-01 21:13:29.117 INFO org.apache.kyuubi.shaded.zookeeper.ClientCnxn: EventThread shut down for session: 0x213e3e17f73ae09
2023-08-01 21:13:29.118 INFO org.apache.kyuubi.operation.LaunchEngine: Processing bdhcwbkj's query[f66a5fad-7a49-4a81-966f-bc2986c0e470]: RUNNING_STATE -> FINISHED_STATE, time taken: 55.101 seconds
2023-08-01 21:13:29.173 INFO org.apache.kyuubi.session.KyuubiSessionImpl: [bdhcwbkj:172.21.21.129] SessionHandle [f323c4c8-074c-4fbf-b087-b82c3bd924f2] - Starting to wait the launch engine operation finished
2023-08-01 21:13:29.174 INFO org.apache.kyuubi.session.KyuubiSessionImpl: [bdhcwbkj:172.21.21.129] SessionHandle [f323c4c8-074c-4fbf-b087-b82c3bd924f2] - Engine has been launched, elapsed time: 0 s
2023-08-01 21:16:15.467 INFO org.apache.kyuubi.operation.log.OperationLog: Creating operation log file /mnt/disk03/bdhmgmas/bss_home/kyuubi/work/server_operation_logs/f323c4c8-074c-4fbf-b087-b82c3bd924f2/5ec579b9-048f-446f-92c9-484bc7ea66fe
2023-08-01 21:16:15.470 INFO org.apache.kyuubi.credentials.HadoopCredentialsManager: Send new credentials with epoch 0 to SQL engine through session f323c4c8-074c-4fbf-b087-b82c3bd924f2
2023-08-01 21:16:15.517 INFO org.apache.kyuubi.credentials.HadoopCredentialsManager: Update session credentials epoch from -1 to 0
2023-08-01 21:16:15.578 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing bdhcwbkj's query[5ec579b9-048f-446f-92c9-484bc7ea66fe]: PENDING_STATE -> RUNNING_STATE, statement:
use bs_cwbdb
2023-08-01 21:16:15.743 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[5ec579b9-048f-446f-92c9-484bc7ea66fe] in FINISHED_STATE
2023-08-01 21:16:15.744 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing bdhcwbkj's query[5ec579b9-048f-446f-92c9-484bc7ea66fe]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.163 seconds
2023-08-01 21:16:15.973 INFO org.apache.kyuubi.client.KyuubiSyncThriftClient: TCloseOperationReq(operationHandle:TOperationHandle(operationId:THandleIdentifier(guid:5E C5 79 B9 04 8F 44 6F 92 C9 48 4B C7 EA 66 FE, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38), operationType:EXECUTE_STATEMENT, hasResultSet:true)) succeed on engine side
2023-08-01 21:16:23.020 INFO org.apache.kyuubi.operation.log.OperationLog: Creating operation log file /mnt/disk03/bdhmgmas/bss_home/kyuubi/work/server_operation_logs/f323c4c8-074c-4fbf-b087-b82c3bd924f2/78285dbe-6b84-4074-82a2-bcbb4b6db06e
2023-08-01 21:16:23.027 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing bdhcwbkj's query[78285dbe-6b84-4074-82a2-bcbb4b6db06e]: PENDING_STATE -> RUNNING_STATE, statement:
select count(*) from viw1
2023-08-01 21:16:24.799 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[78285dbe-6b84-4074-82a2-bcbb4b6db06e] in ERROR_STATE
2023-08-01 21:16:24.817 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing bdhcwbkj's query[78285dbe-6b84-4074-82a2-bcbb4b6db06e]: RUNNING_STATE -> ERROR_STATE, time taken: 1.79 seconds
2023-08-01 21:16:24.837 INFO org.apache.kyuubi.client.KyuubiSyncThriftClient: TCloseOperationReq(operationHandle:TOperationHandle(operationId:THandleIdentifier(guid:78 28 5D BE 6B 84 40 74 82 A2 BC BB 4B 6D B0 6E, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38), operationType:EXECUTE_STATEMENT, hasResultSet:true)) succeed on engine side

Kyuubi Engine Log Output

23/08/01 21:16:23 INFO org.apache.kyuubi.engine.spark.operation.ExecuteStatement(  64): Processing bdhcwbkj's query[78285dbe-6b84-4074-82a2-bcbb4b6db06e]: PENDING_STATE -> RUNNING_STATE, statement:
select count(*) from viw1
23/08/01 21:16:23 INFO org.apache.kyuubi.engine.spark.operation.ExecuteStatement(  64): 
           Spark application name: kyuubi_CONNECTION_SPARK_SQL_bdhcwbkj_f323c4c8-074c-4fbf-b087-b82c3bd924f2
                 application ID: application_1688367119371_0900
                 application web UI: http://bg21146.hadoop.com:8088/proxy/application_1688367119371_0900,http://bg21157.hadoop.com:8088/proxy/application_1688367119371_0900
                 master: yarn
                 deploy mode: client
                 version: 3.3.1
           Start time: 2023-08-01T21:12:37.605
           User: bdhcwbkj
23/08/01 21:16:23 INFO org.apache.kyuubi.engine.spark.operation.ExecuteStatement(  64): Execute in full collect mode
23/08/01 21:16:24 INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator(  61): Code generated in 17.189298 ms
23/08/01 21:16:24 INFO org.apache.spark.storage.memory.MemoryStore(  61): Block broadcast_1 stored as values in memory (estimated size 414.7 KiB, free 365.9 MiB)
23/08/01 21:16:24 INFO org.apache.spark.storage.memory.MemoryStore(  61): Block broadcast_1_piece0 stored as bytes in memory (estimated size 45.0 KiB, free 365.9 MiB)
23/08/01 21:16:24 INFO org.apache.spark.storage.BlockManagerInfo(  61): Added broadcast_1_piece0 in memory on bg21129.hadoop.com:39647 (size: 45.0 KiB, free: 366.3 MiB)
23/08/01 21:16:24 INFO org.apache.spark.SparkContext(  61): Created broadcast 1 from 
23/08/01 21:16:24 WARN org.apache.hadoop.util.concurrent.ExecutorHelper(  63): Caught exception in thread GetFileInfo #1  + : 
org.apache.hadoop.security.AccessControlException: Permission denied: user=bdhcwbkj, access=READ_EXECUTE, inode="/user/bdhmgmas/db/bs_comdb/test1":hdfs:supergroup:drwx--x--x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:553)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:399)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:807)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:552)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:360)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:296)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1951)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1932)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1882)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getListingInt(FSDirStatAndListingOp.java:78)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:3933)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1140)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:747)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1704)
        at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1273)
        at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1256)
        at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1201)
        at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1197)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1215)
        at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2162)
        at org.apache.hadoop.mapred.LocatedFileStatusFetcher$ProcessInputDirCallable.call(LocatedFileStatusFetcher.java:309)
        at org.apache.hadoop.mapred.LocatedFileStatusFetcher$ProcessInputDirCallable.call(LocatedFileStatusFetcher.java:286)
        at org.apache.hadoop.thirdparty.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at org.apache.hadoop.thirdparty.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at org.apache.hadoop.thirdparty.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=bdhcwbkj, access=READ_EXECUTE, inode="/user/bdhmgmas/db/bs_comdb/test1":hdfs:supergroup:drwx--x--x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:553)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:399)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:807)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:552)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:360)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:296)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1951)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1932)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1882)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getListingInt(FSDirStatAndListingOp.java:78)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:3933)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1140)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:747)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
        at org.apache.hadoop.ipc.Client.call(Client.java:1558)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy30.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:688)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy31.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1702)
        ... 15 more
23/08/01 21:16:24 INFO org.apache.spark.scheduler.DAGScheduler(  61): Asked to cancel job group 78285dbe-6b84-4074-82a2-bcbb4b6db06e

Kyuubi Server Configurations

kyuubi-defaults.conf: 
kyuubi.authentication                    kerberos
kyuubi.kinit.principal                   hive/bg21129.hadoop.com@BG.COM
kyuubi.kinit.keytab                      /mnt/disk03/bdhmgmas/keytab/hive.keytab
kyuubi.authentication                    kerberos
kyuubi.frontend.protocols                THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port  10009
kyuubi.frontend.rest.bind.port           10099
kyuubi.engine.type                       SPARK_SQL
kyuubi.engine.share.level                CONNECTION
kyuubi.engine.jdbc.memory                1g
kyuubi.engine.pool.name                  kyuubi-pool
##disable engine poll
kyuubi.engine.pool.size                  -1
kyuubi.engine.single.spark.session       false
kyuubi.engine.user.isolated.spark.session true

kyuubi.session.engine.initialize.timeout PT3M
kyuubi.ha.addresses                      bg21129.hadoop.com:2181,bg21146.hadoop.com:2181,bg21157.hadoop.com:2181
kyuubi.ha.namespace                      kyuubi
kyuubi.ha.zookeeper.acl.enabled          true
kyuubi.ha.zookeeper.auth.type            KERBEROS
kyuubi.ha.zookeeper.engine.auth.type     KERBEROS

kyuubi.metadata.cleaner.enabled          true
kyuubi.metadata.cleaner.interval         PT1H
kyuubi.metadata.store.jdbc.database.type MYSQL
kyuubi.metadata.store.jdbc.url           jdbc:mysql://172.21.101.5:60020/kyuubi
kyuubi.metadata.store.jdbc.user          kyuubi
kyuubi.metadata.store.jdbc.password      kyuubi

kyuubi-env.sh:
export JAVA_HOME=/usr/local/jdk1.8.0_141
export SPARK_HOME=/mnt/disk03/bdhmgmas/bss_home/spark-3.3.1-bin-xuanwu
export SPARK_CONF_DIR=/mnt/disk03/bdhmgmas/CDP/conf/spark-conf
# export FLINK_HOME=/opt/flink
# export HIVE_HOME=/opt/hive
# export FLINK_HADOOP_CLASSPATH=/path/to/hadoop-client-runtime-3.3.2.jar:/path/to/hadoop-client-api-3.3.2.jar
# export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar
# export HADOOP_CONF_DIR=/usr/ndp/current/mapreduce_client/conf
# export YARN_CONF_DIR=/usr/ndp/current/yarn/conf

export KYUUBI_JAVA_OPTS="-Xmx10g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"
export KYUUBI_BEELINE_OPTS="-Xmx1g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark"

Kyuubi Engine Configurations

[bdhmgmas@bg21129 spark-conf]$ ll /mnt/disk03/bdhmgmas/CDP/conf/spark-conf
total 60
-rw-r--r-- 1 bdhmgmas bdos   20 May 26 15:16 __cloudera_generation__
-rw-r--r-- 1 bdhmgmas bdos  155 May 26 15:16 __cloudera_metadata__
-rw-r--r-- 1 bdhmgmas bdos  218 Jul 12 15:28 jaas.conf
-rwxr-xr-x 1 bdhmgmas bdos 3352 Jul 21 10:55 log4j2.properties
-rwxr-xr-x 1 bdhmgmas bdos 3647 Jul 21 14:44 log4j2.properties.bk
-rw-r----- 1 bdhmgmas bdos 1893 Jul 14 15:49 ranger-spark-audit.xml
-rw-r----- 1 bdhmgmas bdos 1893 Jul 20 10:31 ranger-spark-cm_hive-audit.xml
-rw-r----- 1 bdhmgmas bdos  338 Jul 20 10:31 ranger-spark-cm_hive-policymgr-ssl.xml
-rw-r----- 1 bdhmgmas bdos 1435 Jul 20 10:31 ranger-spark-cm_hive-security.xml
-rw-r----- 1 bdhmgmas bdos  338 Jul 11 16:10 ranger-spark-policymgr-ssl.xml
-rw-r----- 1 bdhmgmas bdos 1429 Jul 21 11:16 ranger-spark-security.xml
-rw------- 1 bdhmgmas bdos  278 Jul 14 15:44 solr.keytab
-rw-r--r-- 1 bdhmgmas bdos 2171 Jul 11 14:50 spark-defaults.conf
-rw-r--r-- 1 bdhmgmas bdos  188 Jul 14 13:56 spark-env.sh
drwxr-xr-x 2 bdhmgmas bdos 4096 May 26 17:49 yarn-conf

[bdhmgmas@bg21129 spark-conf]$ cat spark-defaults.conf
spark.authenticate=false
spark.driver.log.dfsDir=/user/spark/driver3Logs
spark.driver.log.persistToDfs.enabled=true
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.io.encryption.enabled=false
spark.lineage.enabled=true
spark.network.crypto.enabled=false
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7447
spark.ui.enabled=true
spark.ui.killEnabled=true
spark.master=yarn
spark.submit.deployMode=client
spark.eventLog.dir=hdfs://patch-bak-bas/user/spark/spark3ApplicationHistory
spark.yarn.historyServer.address=http://bg21129.hadoop.com:18089
##spark.yarn.jars=local:/opt/cloudera/parcels/SPARK3-3.2.1.3.2.7171000.1-1-1.p0.25570994/lib/spark3/jars/*,local:/opt/cloudera/parcels/SPARK3-3.2.1.3.2.7171000.1-1-1.p0.25570994/lib/spark3/hive/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/hadoop/lib/native
##spark.yarn.config.gatewayPath=/opt/cloudera/parcels
##spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
##spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda3-2021.05/bin/python
##spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda3-2021.05/bin/python
spark.yarn.historyServer.allowTracking=true
spark.yarn.appMasterEnv.MKL_NUM_THREADS=1
spark.executorEnv.MKL_NUM_THREADS=1
spark.yarn.appMasterEnv.OPENBLAS_NUM_THREADS=1
spark.executorEnv.OPENBLAS_NUM_THREADS=1
##spark.hadoop.fs.s3a.committer.name=directory
##spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
##spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
spark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 year ago

Hello @AuthurWang2009, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

bowenliang123 commented 1 year ago

The error message simply shows the user actually running the Spark application does not allow to read the target file. It's HDFS's behaviour and the action was rejected by HDFS itself. Kyuubi Server and Kyuubi's Authz plugin for Ranger never check file permission.

AuthurWang2009 commented 1 year ago

Does Spark application have any methods to overpass the permission of hdfs。 In my option, if we execute sql with authz plugin, the plugin will the allow/disallow the sql to execute in parse stage. In running stage, it should use super user of kyuubi server to overcome the lack of permission. Otherwise, the authz will do not take effect because of other permission situation。

The error message simply shows the user actually running the Spark application does not allow to read the target file. It's HDFS's behaviour and the action was rejected by HDFS itself. Kyuubi Server and Kyuubi's Authz plugin for Ranger never check file permission.

bowenliang123 commented 1 year ago

Again, it's nothing to do with the authz plugin, which is only responsible for checking privileges for the session user on targeted privilege objects (eg. tables, columns.) with Ranger. It never concerns or intercepts files' operation. Maybe a more general share level should be considered in your case. For example with the server share level, an engine is submitted one time with a specific proxy-user and shared by all the sessions.

AuthurWang2009 commented 1 year ago

The execution workflow of kyuubi looks like this: 1、the server pulls policies of hive service in ranger with server principal and keytab 2、the server parses the sql and maybe access hive metastore for more information about table info of view with real user 3、the server submit the sql,and launch spark app with real user to do the job

in this situation, step 2 and step 3 can probably run into exception: 1、table access permission do not configured in hive service, and hive service only configures the view access condition, so the real user access hive metastore will be disallowed. 2、spark app translates logic plan to physical plan, and read hdfs file accordingly. and hdfs service in ranger will deny spark app to access them, for it have no permission to access hdfs file directly.

So, How can we work around without changing the security policy?

AuthurWang2009 commented 1 year ago

By the way, the hdfs permission is also ruled by ranger。Should we configure policy for every table? That sounds unreasonable for table and view situation。 We only configure policy for mapreduce or other apps which has no tabel or view,and use hdfs dirctly.