databrickslabs / ucx

Your best companion for upgrading to Unity Catalog. UCX will guide you, the Databricks customer, through the process of upgrading your account, groups, workspaces, jobs etc. to Unity Catalog.
Other
194 stars 69 forks source link

Do not migrate READ_METADATA to BROWSE on tables and schemas #2022

Closed qziyuan closed 2 days ago

qziyuan commented 4 days ago

Changes

UC only support BROWSE privilege on catalog object. Translate legacy hive_metastore privilege READ_METADATA on tables and databases to BROWSE privilege on UC tables and schemas will fail and cause error messages in the migrate tables workflow logs, such error messages will confuse the users.

qziyuan commented 4 days ago

@qziyuan : Which error does it cause? And why?

Here is an example: The table hive_metastore.snapshot_amer_internal.axiometrics_property_attr has a legacy TACL like: Principal ActionType ObjectType ObjectKey
test_group READ_METADATA TABLE snapshot_amer_internal.axiometrics_property_attr

The table migration task will try to migrate it with GRANT BROWSE ON TABLE uc_catalog.snapshot_amer_internal.axiometrics_property_attr TO test_group which will cause the following error:

Failed to migrate ACL for hive_metastore.snapshot_amer_internal.axiometrics_property_attr to uc_discovery_catalog_dev.snapshot_amer_internal.axiometrics_property_attr: [RequestId=9c00a4e3-61c7-4d1b-ac12-b7c4c4dd2d37 ErrorClass=INVALID_PARAMETER_VALUE] Privilege BROWSE is not applicable to this entity [uc_discovery_catalog_dev.snapshot_amer_internal.axiometrics_property_attr:TABLE/TABLE_DELTA_EXTERNAL]. If this seems unexpected, please check the privilege version of the metastore in use [1.0].
JVM stacktrace:
com.databricks.sql.managedcatalog.UnityCatalogServiceException
    at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:35)
    at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:24)
    at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:158)
    at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:4551)
    at com.databricks.managedcatalog.ManagedCatalogClientImpl.updatePermissions(ManagedCatalogClientImpl.scala:3010)
    at com.databricks.sql.managedcatalog.ManagedCatalogCommon.addPermissions(ManagedCatalogCommon.scala:1729)
    at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$addPermissions$1(ProfiledManagedCatalog.scala:398)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:714)
    at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:62)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
    at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:61)
    at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.addPermissions(ProfiledManagedCatalog.scala:398)
    at com.databricks.sql.managedcatalog.command.GrantPermissionsCommandV2.run(PermissionCommands.scala:125)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$2(commands.scala:84)
    at org.apache.spark.sql.execution.SparkPlan.runCommandWithAetherOff(SparkPlan.scala:178)
    at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$1(commands.scala:84)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:81)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:80)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:94)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:358)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:166)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:358)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$9(SQLExecution.scala:387)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:691)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:276)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:163)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:628)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:357)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1097)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:353)
    at [org.apache.spark.sql.execution.QueryExecution.org](http://org.apache.spark.sql.execution.queryexecution.org/)$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:312)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:350)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:334)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:477)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:477)
    at [org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org](http://org.apache.spark.sql.catalyst.plans.logical.logicalplan.org/)$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:39)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:343)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:339)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:453)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:334)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:400)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:334)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:271)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:268)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:289)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:127)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175)
    at org.apache.spark.sql.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1182)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
    at org.apache.spark.sql.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1182)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:116)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:954)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:942)
    at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleSqlCommand(SparkConnectPlanner.scala:2742)
    at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2695)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.handleCommand(ExecuteThreadRunner.scala:285)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:229)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:167)
    at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:332)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175)
    at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:332)
    at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97)
    at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:84)
    at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:234)
    at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:83)
    at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:331)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:167)
    at [org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org](http://org.apache.spark.sql.connect.execution.executethreadrunner.org/)$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:118)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.$anonfun$run$1(ExecuteThreadRunner.scala:349)
    at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
    at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103)
    at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108)
    at scala.util.Using$.resource(Using.scala:269)
    at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107)
    at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:348)
JCZuurmond commented 4 days ago

Thanks for the additional context. Confirmed in the documentation that BROWSE is not supported on tables, views and databases