apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug] Kyuubi integrated Ranger does not support the CTAS syntax #2929

Closed MLikeWater closed 2 years ago

MLikeWater commented 2 years ago

Code of Conduct

Search before asking

Describe the bug

Kyuubi set the following parameters in kyuubi-defaults.conf:

spark.sql.extensions org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension

Then use beeline to execute DDL command, but failed:

0: jdbc:hive2://10.2.1.6:10011/default> create table promotion_like as select * from promotion; 
0: jdbc:hive2://10.2.1.6:10011/default> --create table promotion_like stored as orc as select * from promotion;
22/06/22 12:06:43 WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
22/06/22 12:06:43 INFO DAGScheduler: Asked to cancel job group 6c90d032-6216-40fc-93cd-86c79d2a07db
22/06/22 12:06:43 ERROR ExecuteStatement: Error operating EXECUTE_STATEMENT: java.lang.RuntimeException: table not in [tableDesc,query,outputColumnNames,mode,tableIdentifier,metrics,children,nodePatterns,bitmap$0,bitmap$trans$0]
    at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.getFieldVal(AuthZUtils.scala:42)
    at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.getPlanField$1(PrivilegesBuilder.scala:164)
    at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.buildCommand(PrivilegesBuilder.scala:329)
    at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.build(PrivilegesBuilder.scala:561)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:43)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:32)
    at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:30)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:138)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:134)
    at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
    at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
    at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
    at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:214)
    at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:259)
    at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:228)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:94)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:88)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:89)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:125)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchFieldException: table
    at java.lang.Class.getDeclaredField(Class.java:2070)
    at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.$anonfun$getFieldVal$1(AuthZUtils.scala:35)
    at scala.util.Try$.apply(Try.scala:213)
    at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.getFieldVal(AuthZUtils.scala:34)
    ... 68 more

If not set spark.sql.extensions, the query is normal.

Affects Version(s)

1.6.0(master branch)

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

spark.sql.extensions org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension

spark.hive.default.fileformat orc
spark.default.parallelism 500
spark.hadoopRDD.targetBytesInPartition 67108864
spark.hadoop.hive.exec.orc.split.strategy ETL
spark.hadoop.mapreduce.input.input.fileinputformat.split.minsize 67108864
spark.hadoop.mapreduce.input.input.fileinputformat.split.maxsize 67108864
spark.sql.shuffle.partitions 2000
spark.sql.files.maxPartitionBytes 67108864
spark.sql.mergeSmallFileSize 67108864

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

MLikeWater commented 2 years ago

@packyan Please help and support. Thanks:)

packyan commented 2 years ago

@packyan Please help and support. Thanks:)

can you run explain ddl command to show the plan? I can not reproduce this issue.

MLikeWater commented 2 years ago

@packyan Please help and support. Thanks:)

can you run explain ddl command to show the plan? I can not reproduce this issue.

0: jdbc:hive2://10.2.1.6:10011/default> explain create table test as select * from call_center;
+----------------------------------------------------+
|                        plan                        |
+----------------------------------------------------+
| Error occurred during query planning:              |
| table not in [tableDesc,query,outputColumnNames,mode,tableIdentifier,metrics,children,nodePatterns,bitmap$0,bitmap$trans$0] |
+----------------------------------------------------+

use OptimizedCreateHiveTableAsSelectCommand to execute:


case "CreateDataSourceTableAsSelectCommand" |
"OptimizedCreateHiveTableAsSelectCommand" =>
val table = getPlanField[CatalogTable]("table").identifier
outputObjs += tablePrivileges(table)
buildQuery(getQuery, inputObjs)

case "CreateHiveTableAsSelectCommand" => val table = getPlanFieldCatalogTable.identifier val cols = getPlanFieldSeq[String] outputObjs += tablePrivileges(table, cols) buildQuery(getQuery, inputObjs)


Should we change `val table = getPlanField[CatalogTable]("table").identifier` to `val table = getPlanField[CatalogTable]("tableDesc").identifier`?