Closed MLikeWater closed 2 years ago
reproduce steps:
0: jdbc:hive2://10.2.1.6:10011/default> create database test;
0: jdbc:hive2://10.2.1.6:10011/default> use test;
0: jdbc:hive2://10.2.1.6:10011/default> create table test(id int);
0: jdbc:hive2://10.2.1.6:10011/default> create table test2(id int,name string);
0: jdbc:hive2://10.2.1.6:10011/default> select a.id,b.name from test a join test2 b on a.id = b.id;
Error: org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Error operating EXECUTE_STATEMENT: java.lang.IllegalStateException: table stats must be specified.
at org.apache.spark.sql.errors.QueryExecutionErrors$.tableStatsNotSpecifiedError(QueryExecutionErrors.scala:372)
at org.apache.spark.sql.catalyst.catalog.HiveTableRelation.$anonfun$computeStats$3(interface.scala:836)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.catalyst.catalog.HiveTableRelation.computeStats(interface.scala:836)
Some Spark SQL errors exist in the latest master branch.
@yaooqinn please take a look. Thanks :)
If execute ANALYZE TABLE tablename COMPUTE STATISTICS;
, then the query is normal.
There seems to be a problem below. But I don't know how to solve the problem. Could you help me out? @yaooqinn @ulysses-you https://github.com/apache/incubator-kyuubi/blob/7460e745c3e5c5efc377a69004572037b5e2fef4/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleApplyRowFilterAndDataMasking.scala#L34
There is my test case.
test("test join") {
try {
doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS test"))
doAs("admin", sql(s"create table test.test1(id int) stored as orc"))
doAs("admin", sql(s"insert into test.test1 select 1"))
doAs("admin", sql(s"create table test.test2(id int,name string) stored as parquet"))
doAs("admin", sql(s"insert into test.test2 select 1, 'a'"))
doAs("admin", sql(s"select a.id, b.name from test.test1 a join test.test2 b on a.id = b.id")
.show(false))
} finally {
doAs("admin", sql(s"DROP TABLE IF EXISTS test.test1"))
doAs("admin", sql(s"DROP TABLE IF EXISTS test.test2"))
doAs("admin", sql(s"DROP DATABASE IF EXISTS test"))
}
}
There seems to be a problem below. But I don't know how to solve the problem. Could you help me out? @yaooqinn @ulysses-you
There is my test case.
test("test join") { try { doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS test")) doAs("admin", sql(s"create table test.test1(id int) stored as orc")) doAs("admin", sql(s"insert into test.test1 select 1")) doAs("admin", sql(s"create table test.test2(id int,name string) stored as parquet")) doAs("admin", sql(s"insert into test.test2 select 1, 'a'")) doAs("admin", sql(s"select a.id, b.name from test.test1 a join test.test2 b on a.id = b.id") .show(false)) } finally { doAs("admin", sql(s"DROP TABLE IF EXISTS test.test1")) doAs("admin", sql(s"DROP TABLE IF EXISTS test.test2")) doAs("admin", sql(s"DROP DATABASE IF EXISTS test")) } }
@zhaomin1423 Thanks for helping test and verify.
Comment out the following code:
class RangerSparkExtension extends (SparkSessionExtensions => Unit) {
SparkRangerAdminPlugin.init()
override def apply(v1: SparkSessionExtensions): Unit = {
v1.injectResolutionRule(_ => new RuleReplaceShowObjectCommands())
// v1.injectResolutionRule(new RuleApplyRowFilterAndDataMasking(_))
v1.injectOptimizerRule(_ => new RuleEliminateMarker())
v1.injectOptimizerRule(new RuleAuthorization(_))
v1.injectPlannerStrategy(new FilterDataSourceV2Strategy(_))
}
}
This problem can be avoided, and the ORC and Parquet reading problems will also be resolved. #2939
I can not get the problem.. how can ranger affect the hive relation statistics ?
I can not get the problem.. how can ranger affect the hive relation statistics ?
By debug, before applying RuleApplyRowFilterAndDataMasking, the table is LogicalRelation, so query is using org.apache.spark.sql.execution.datasources.LogicalRelation#computeStats, it is normal. After applying RuleApplyRowFilterAndDataMasking, the table is HiveTableRelation, so query is using org.apache.spark.sql.catalyst.catalog.HiveTableRelation#computeStats, it causes an error. I did not know why this is.
It seems we lose apply DetermineTableStats
which is the Spark built-in rule. We warp HiveTableRelation during analyzer resolution and unwrap it in optimzier but DetermineTableStats
is at post hoc resolution.
we can execute our rule with relation.tableMeta.stats.nonEmpty
Code of Conduct
Search before asking
Describe the bug
Kyuubi set the following parameters in kyuubi-defaults.conf:
Then use beeline to access orc storage format tables, but failed:
However, with Ranger integration , querying the table in TextFile format is normal. If not set spark.sql.extensions, the query is normal.
Affects Version(s)
1.6.0(master branch)
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?