apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug] [AuthZ] WARN ShellBasedUnixGroupsMapping: unable to return groups for user xxx #6100

Closed wardlican closed 7 months ago

wardlican commented 7 months ago

Code of Conduct

Search before asking

Describe the bug

In the scenario of using spark-on-k8s on kyuubi-1.8.0 version, after the AuthZ-Ranger plugin is currently integrated, after creating a link, there will be the following warning information that the user group cannot be obtained, which does not affect the running of the task. Is there any way to avoid this warning?

WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
WARN ShellBasedUnixGroupsMapping: unable to return groups for user xxx
PartialGroupNameException The user name 'xxx' is not found. id: xxx: no such user
id: xxx: no such user

        at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:294)
        at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:207)
        at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:97)
        at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:51)
        at org.apache.hadoop.security.Groups$GroupCacheLoader.fetchGroupList(Groups.java:387)
        at org.apache.hadoop.security.Groups$GroupCacheLoader.load(Groups.java:321)
        at org.apache.hadoop.security.Groups$GroupCacheLoader.load(Groups.java:270)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3570)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2312)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:4011)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)
        at org.apache.hadoop.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
        at org.apache.hadoop.security.Groups.getGroups(Groups.java:228)
        at org.apache.hadoop.security.UserGroupInformation.getGroups(UserGroupInformation.java:1923)
        at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1911)
        at org.apache.kyuubi.plugin.spark.authz.ranger.AccessRequest$.getUserGroupsFromUgi(AccessRequest.scala:72)
        at org.apache.kyuubi.plugin.spark.authz.ranger.AccessRequest$.getUserGroups(AccessRequest.scala:93)
        at org.apache.kyuubi.plugin.spark.authz.ranger.AccessRequest$.apply(AccessRequest.scala:41)
        at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$1(RuleAuthorization.scala:62)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.addAccessRequest$1(RuleAuthorization.scala:57)
        at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:68)
        at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
        at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)
        at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:143)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:139)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:135)
        at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:153)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:171)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
        at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
        at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
        at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
        at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:218)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:95)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:671)
        at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
        at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
        at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
        at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Affects Version(s)

1.8.0

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

wardlican commented 7 months ago

You should use the following configuration to obtain the user's group information from ranger to avoid the above warning.

ranger.plugin.$getServiceType.use.usergroups.from.userstore.enabled = true

pan3793 commented 7 months ago

https://github.com/apache/spark/pull/40831 may help

shawnwxc commented 4 months ago

I wonder how to set "ranger.plugin.$getServiceType.use.usergroups.from.userstore.enabled = true" . when "ranger.plugin.spark.use.usergroups.from.userstore.enabled = true" was set for my spark-authz plugin, an exception was thrown as shown in the pic 20240603144049

Any suggestion to set "ranger.plugin.$getServiceType.use.usergroups.from.userstore.enabled = true" ?