apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug] Spark authz extension (ranger) always deny createGlobalTempView #4441

Closed rwu-wish closed 1 year ago

rwu-wish commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

With spark authz extension enabled (Ranger plugin), the spark api createGlobalTempView seem to always get denied even though I have given my user full access on Ranger admin server.

Reproduction steps: Spark version 3.3.0

$ spark-shell

val df = Seq(1,2,3).toDF("num")
df.createGlobalTempView("tempview")
org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [hadoop] does not have [create] privilege on [tempview]
  at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.verify(RuleAuthorization.scala:88)
  at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$3(RuleAuthorization.scala:80)
  at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$3$adapted(RuleAuthorization.scala:71)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:71)
  at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:35)
  at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:32)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:91)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274)
  at scala.collection.immutable.List.foreach(List.scala:431)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:134)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:213)
  at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:552)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:213)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
  at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:212)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:130)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:126)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:296)
  at org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:657)
  at org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:296)
  at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:313)
  at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:267)
  at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:246)
  at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
  at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)

The same steps work on spark 3.1.2 with the spark ranger plugin from the previous project submarine https://submarine.apache.org/docs/0.6.0/userDocs/submarine-security/spark-security/build-submarine-spark-security-plugin/

org.apache.submarine.spark.security.api.RangerSparkSQLExtension

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2-amzn-1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_362)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = Seq(1,2,3).toDF("num")
df: org.apache.spark.sql.DataFrame = [num: int]

scala> df.show()
+---+
|num|
+---+
|  1|
|  2|
|  3|
+---+

scala> df.createOrReplaceTempView("tempview")
23/03/03 08:00:15 WARN SimpleFunctionRegistry: The function mask replaced a previously registered function.
23/03/03 08:00:15 WARN SimpleFunctionRegistry: The function mask_hash replaced a previously registered function.
23/03/03 08:00:15 WARN SimpleFunctionRegistry: The function mask_first_n replaced a previously registered function.
23/03/03 08:00:15 WARN SimpleFunctionRegistry: The function mask_last_n replaced a previously registered function.
23/03/03 08:00:15 WARN SimpleFunctionRegistry: The function mask_show_last_n replaced a previously registered function.
23/03/03 08:00:15 WARN SimpleFunctionRegistry: The function mask_show_first_n replaced a previously registered function.

scala> spark.sql("select * from tempview").show()
23/03/03 08:00:33 WARN SimpleFunctionRegistry: The function mask replaced a previously registered function.
23/03/03 08:00:33 WARN SimpleFunctionRegistry: The function mask_hash replaced a previously registered function.
23/03/03 08:00:33 WARN SimpleFunctionRegistry: The function mask_first_n replaced a previously registered function.
23/03/03 08:00:33 WARN SimpleFunctionRegistry: The function mask_last_n replaced a previously registered function.
23/03/03 08:00:33 WARN SimpleFunctionRegistry: The function mask_show_last_n replaced a previously registered function.
23/03/03 08:00:33 WARN SimpleFunctionRegistry: The function mask_show_first_n replaced a previously registered function.
+---+
|num|
+---+
|  1|
|  2|
|  3|
+---+

Affects Version(s)

1.6.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 year ago

Hello @rwu-wish, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

yaooqinn commented 1 year ago

Can you help verify v1.7.0? https://lists.apache.org/thread/n6jlg1hqhdxjj35tw644ndbt2fbd1sjr

rwu-wish commented 1 year ago

I just tried master branch , it seems working. Thanks I will resolve

pan3793 commented 1 year ago

@rwu-wish thanks for the verification, Kyuubi 1.7.0 is voting, will appreciate it if you can reply in the mail listing. (reply via your own email client)