apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug][AuthZ] Don't have permissions to create UDF functions when not specify a database name #3925

Closed MLikeWater closed 1 year ago

MLikeWater commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

Env

Spark version:3.2.2 Kyuubi version: apache-kyuubi-1.7.0-SNAPSHOT-bin (master)/apache-kyuubi-1.6.1 release

reproduce

add jar hdfs://cluster1/warehouse/udfs/test-udf-1.0.jar;
create function test_udf as 'com.dt.hive.udfs.TestUDF' using jar "hdfs://cluster1/warehouse/udfs/test-udf-1.0.jar";

Caused by: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test_user] does not have [create] privilege on [test_udf]
    at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:128)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:94)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:93)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:93)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
    at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
    at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:201)

If specify a database name, create udf function is normal:

create function testdb.test_udf as 'com.dt.hive.udfs.TestUDF' using jar "hdfs://cluster1/warehouse/udfs/test-udf-1.0.jar";
show user functions;
+-----------------------+
|       function        |
+-----------------------+
| engine_id             |
| engine_name           |
| kyuubi_version        |
| session_user          |
| system_user           |
| testdb.test_udf       |
+-----------------------+ 

Affects Version(s)

1.7.0(master branch)

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

Caused by: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test_user] does not have [create] privilege on [test_udf]
    at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:128)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:94)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:93)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:93)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
    at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
    at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:201)

Kyuubi Server Configurations

spark.sql.extensions org.apache.kyuubi.sql.KyuubiSparkSQLExtension,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type hive

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

yaooqinn commented 1 year ago

seems we need to apply the current database when the database field is missing. cc @bowenliang123

bowenliang123 commented 1 year ago

Please try again with latest code on master branch. Especially with complete refactoring in Authz command mapping in pull https://github.com/apache/incubator-kyuubi/pull/3904.

Line 152-154 in Descriptor.scala should be able to fill missing database for function commands.

    if (function.database.isEmpty) {
      function = function.copy(database = databaseDesc.map(_.extract(v)))
    }

And I couldn't reproduce it in ut with the latest code on the master branch.

image
cxzl25 commented 1 year ago

I think this problem has been solved, let me close this issue. Thanks to @MLikeWater report this problem. Thanks to @bowenliang123 check.