Closed wankunde closed 1 month ago
Hello @wankunde, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.
Test with local 50000 files:
test("KYUUBI #6754: improve the performance of ranger access requests") {
val outputPath = "/private/var/folders/tr/scn8dgl13_l6_sh17bghtln1b35kn1/T/kyuubi-test-5492934124608743789/"
println("output path: "+ outputPath)
val plugin = mock[SparkRangerAdminPlugin.type]
when(plugin.verify(Seq(any[RangerAccessRequest]), any[SparkRangerAuditHandler]))
.thenAnswer(_ => ())
val df = spark.read.parquet(outputPath + "/*/*.parquet")
val plan = df.queryExecution.optimizedPlan
val start = System.currentTimeMillis()
RuleAuthorization(spark).checkPrivileges(spark, plan)
val end = System.currentTimeMillis()
println(s"Time elapsed : ${end - start} ms")
}
Before After
Code of Conduct
Search before asking
What would you like to be improved?
Right now in RuleAuthorization we use an ArrayBuffer to collect access requests, which is very slow because each new PrivilegeObject needs to be compared with all access requests.
How should we improve?
We can use a HashMap to optimize this.
Are you willing to submit PR?