apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug] [Authz]Task not serializable for show tables with limit #4617

Closed yaooqinn closed 1 year ago

yaooqinn commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

We can add a test case

doAs("i_am_invisible", assert(sql(s"show tables from $db").limit(1).isEmpty))

to reproduce

 Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog
Serialization stack:
    - object not serializable (class: org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog, value: V2SessionCatalog(spark_catalog))
    - field (class: org.apache.spark.sql.execution.datasources.v2.ShowTablesExec, name: catalog, type: interface org.apache.spark.sql.connector.catalog.TableCatalog)
    - object (class org.apache.spark.sql.execution.datasources.v2.ShowTablesExec, ShowTables [namespace#11804, tableName#11805, isTemporary#11806], V2SessionCatalog(spark_catalog), [default2]
)
    - field (class: org.apache.kyuubi.plugin.spark.authz.ranger.FilteredShowTablesExec, name: delegated, type: class org.apache.spark.sql.execution.SparkPlan)
    - object (class org.apache.kyuubi.plugin.spark.authz.ranger.FilteredShowTablesExec, FilteredShowTables ShowTables [namespace#11804, tableName#11805, isTemporary#11806], V2SessionCatalog(spark_catalog), [default2]
)
    - field (class: org.apache.spark.sql.execution.InputAdapter, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
    - object (class org.apache.spark.sql.execution.InputAdapter, FilteredShowTables ShowTables [namespace#11804, tableName#11805, isTemporary#11806], V2SessionCatalog(spark_catalog), [default2]
)
    - field (class: org.apache.spark.sql.execution.ProjectExec, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
    - object (class org.apache.spark.sql.execution.ProjectExec, Project
+- FilteredShowTables ShowTables [namespace#11804, tableName#11805, isTemporary#11806], V2SessionCatalog(spark_catalog), [default2]
)
    - field (class: org.apache.spark.sql.execution.WholeStageCodegenExec, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
    - object (class org.apache.spark.sql.execution.WholeStageCodegenExec, *(1) Project
+- FilteredShowTables ShowTables [namespace#11804, tableName#11805, isTemporary#11806], V2SessionCatalog(spark_catalog), [default2]
)
    - field (class: org.apache.spark.sql.execution.CollectLimitExec, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
    - object (class org.apache.spark.sql.execution.CollectLimitExec, CollectLimit 1
+- *(1) Project
   +- FilteredShowTables ShowTables [namespace#11804, tableName#11805, isTemporary#11806], V2SessionCatalog(spark_catalog), [default2]
)
    - element of array (index: 0)
    - array (class [Ljava.lang.Object;, size 1)
    - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
    - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.spark.sql.execution.CollectLimitExec, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/apache/spark/sql/execution/CollectLimitExec.$anonfun$doExecute$2:(Lorg/apache/spark/sql/execution/CollectLimitExec;Lscala/collection/Iterator;)Lscala/collection/Iterator;, instantiatedMethodType=(Lscala/collection/Iterator;)Lscala/collection/Iterator;, numCaptured=1])
    - writeReplace data (class: java.lang.invoke.SerializedLambda)
    - object (class org.apache.spark.sql.execution.CollectLimitExec$$Lambda$5807/711438897, org.apache.spark.sql.execution.CollectLimitExec$$Lambda$5807/711438897@6a0a7017)
    - element of array (index: 0)
    - array (class [Ljava.lang.Object;, size 1)
    - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
    - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.spark.rdd.RDD, functionalInterfaceMethod=scala/Function3.apply:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/apache/spark/rdd/RDD.$anonfun$mapPartitionsInternal$2$adapted:(Lscala/Function1;Lorg/apache/spark/TaskContext;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;, instantiatedMethodType=(Lorg/apache/spark/TaskContext;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;, numCaptured=1])
    - writeReplace data (class: java.lang.invoke.SerializedLambda)
    - object (class org.apache.spark.rdd.RDD$$Lambda$3064/139506439, org.apache.spark.rdd.RDD$$Lambda$3064/139506439@4c646d0e)
    - field (class: org.apache.spark.rdd.MapPartitionsRDD, name: f, type: interface scala.Function3)
    - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[131] at isEmpty at RangerSparkExtensionSuite.scala:313)
    - field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class org.apache.spark.rdd.RDD)
    - object (class org.apache.spark.OneToOneDependency, org.apache.spark.OneToOneDependency@dc15b86)
    - writeObject data (class: scala.collection.immutable.List$SerializationProxy)
    - object (class scala.collection.immutable.List$SerializationProxy, scala.collection.immutable.List$SerializationProxy@78a964f6)
    - writeReplace data (class: scala.collection.immutable.List$SerializationProxy)
    - object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.OneToOneDependency@dc15b86))
    - field (class: org.apache.spark.rdd.RDD, name: dependencies_, type: interface scala.collection.Seq)
    - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[132] at isEmpty at RangerSparkExtensionSuite.scala:313)
    - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
    - object (class scala.Tuple2, (MapPartitionsRDD[132] at isEmpty at RangerSparkExtensionSuite.scala:313,org.apache.spark.SparkContext$$Lambda$3117/648224575@6c014e80))

Affects Version(s)

1.6.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

yaooqinn commented 1 year ago

cc @bowenliang123