databrickslabs / ucx

Automated migrations to Unity Catalog
Other
227 stars 80 forks source link

Revise control flow for `Migrate Groups` with more resiliency #349

Closed zpappa closed 6 months ago

zpappa commented 1 year ago

Background

This issue attempts to capture the proposed control flow for group migration. This control flow should be idempotently repeatable, transacted at various states, and provide different reporting points to the user.

Related Issues

343

342

344

Entry Points

The entry point should be easily runnable via CLI and take the workspace id as a default parameter

databricks labs ucx migrate-groups --workspace-id=12132312323

Optional Parameters Parameter Comment
--use-workspace-scoped-group-names Instruct tool to automatically create account groups with the workspace name prepended
--skip-groups Do not create groups
--skip-object-permissions Do not re-apply object permissions on workspace objects
--skip-table-acls Do not translate table ACLs to UC
--persist-manifest Keep the manifest files for inspection, will be deleted if the command is run again successfully without this
--dry-run Do not create groups, output instead the proposed changes

Control Flow Overview

This entire control flow should operate with some temporary state management between actions so that operations can be continued or the control flow can be retried in the case of an error or interruption.

Persisted Data Structures

External Workspace Groups Requiring SCIM Configuration #343 Field Comment
workspace_id the workspace id in question, int
local_group_name the name of the group in the given workspace, string
user_name the name of the user in the group, string
new_account_group_name What is the new name of the account group I need to create, string
Duplicated workspace groups with different members #344 Field Comment
workspace_id the workspace id in question, int
duplicated_group_name the name of the group in the given workspace, string
members_as_list_in_account_console comma-separated members from the account console, string
members_as_list_in_workspace comma-separated members from the account console, string

Temporarily Persisted Data Structures

object_type object_id migrated failures
TABLES hive_metastore.foo.bar 1 []
TABLES hive_metastore.foo.baz 0 []
TABLES hive_metastore.foo.boz 0 ["Unsupported SerDe formant: OpenCSV"]
GRANT_SELECT hive_metastore.foo.baz:group_1 0 []
GRANT_MODIFY hive_metastore.foo.baz:group_2 0 []
CLUSTER_PERMISSIONS 2984-dsfjlkskd-2393 0 []
CLUSTER_PERMISSIONS 2984-dsfjlkskd-2394 1 []
CLUSTER_PERMISSIONS 2984-dsfjlkskd-2394 1 ["uses storage SPN"]

Phase 1 - Workspace Group Migration

Phase 2 - Workspace Object Permission Migration

Phase 3 - Workspace Table ACL Migration

zpappa commented 1 year ago

@nfx Here is a design overview for Group/Object Permission/ACL migration

We should agree on functionality, and operation here and then create follow-up issues to get alignment on what already exists.

nfx commented 1 year ago

@zpappa This issue is too long. Create a PR and copy the contents into a markdown file in docs folder and consolidate it with the other group migration docs from there, because this issue mixes current state and desired state. Also don't be that prescriptive on dashboards/internal persistence structures.

Let's take it from there.

nfx commented 1 year ago

@zpappa is committed to split of this to multiple different issues and close this one.

zpappa commented 1 year ago

@nfx I will leave this open and link back to it so implementors have the full context when working on their issue

nfx commented 1 year ago

there has to be only one view:

object_type object_id migrated failures
TABLES hive_metastore.foo.bar 1 []
TABLES hive_metastore.foo.baz 0 []
TABLES hive_metastore.foo.boz 0 ["Unsupported SerDe formant: OpenCSV"]
GRANT_SELECT hive_metastore.foo.baz:group_1 0 []
GRANT_MODIFY hive_metastore.foo.baz:group_2 0 []
CLUSTER_PERMISSIONS 2984-dsfjlkskd-2393 0 []
CLUSTER_PERMISSIONS 2984-dsfjlkskd-2394 1 []
CLUSTER_PERMISSIONS 2984-dsfjlkskd-2394 1 ["uses storage SPN"]

and number of persistend structures has to get to the very minimum

image
FastLee commented 1 year ago

@nfx @zpappa may I suggest to combine the assessment step for group migration with the group migration and take it out of assessment. It takes a really long time to run and is not a necessary step before the actual table migration.

nfx commented 1 year ago

We might separate it, but isn't it the goal to prepare all the data before any other steps? We need to list notebooks for scanning their contents anyway.

nfx commented 1 year ago

scanned through requirements - generating a notebook must not be an option (this is solutions), we have predefined workflows (this is design).

nfx commented 6 months ago

new permission migration api makes this issue irrelevant.