Closed zpappa closed 6 months ago
@nfx Here is a design overview for Group/Object Permission/ACL migration
We should agree on functionality, and operation here and then create follow-up issues to get alignment on what already exists.
@zpappa This issue is too long. Create a PR and copy the contents into a markdown file in docs folder and consolidate it with the other group migration docs from there, because this issue mixes current state and desired state. Also don't be that prescriptive on dashboards/internal persistence structures.
Let's take it from there.
@zpappa is committed to split of this to multiple different issues and close this one.
@nfx I will leave this open and link back to it so implementors have the full context when working on their issue
there has to be only one view:
object_type | object_id | migrated | failures |
---|---|---|---|
TABLES | hive_metastore.foo.bar | 1 | [] |
TABLES | hive_metastore.foo.baz | 0 | [] |
TABLES | hive_metastore.foo.boz | 0 | ["Unsupported SerDe formant: OpenCSV"] |
GRANT_SELECT | hive_metastore.foo.baz:group_1 |
0 | [] |
GRANT_MODIFY | hive_metastore.foo.baz:group_2 |
0 | [] |
CLUSTER_PERMISSIONS | 2984-dsfjlkskd-2393 | 0 | [] |
CLUSTER_PERMISSIONS | 2984-dsfjlkskd-2394 | 1 | [] |
CLUSTER_PERMISSIONS | 2984-dsfjlkskd-2394 | 1 | ["uses storage SPN"] |
and number of persistend structures has to get to the very minimum
@nfx @zpappa may I suggest to combine the assessment step for group migration with the group migration and take it out of assessment. It takes a really long time to run and is not a necessary step before the actual table migration.
We might separate it, but isn't it the goal to prepare all the data before any other steps? We need to list notebooks for scanning their contents anyway.
scanned through requirements - generating a notebook must not be an option (this is solutions), we have predefined workflows (this is design).
new permission migration api makes this issue irrelevant.
Background
This issue attempts to capture the proposed control flow for group migration. This control flow should be idempotently repeatable, transacted at various states, and provide different reporting points to the user.
Related Issues
343
342
344
Entry Points
The entry point should be easily runnable via CLI and take the workspace id as a default parameter
databricks labs ucx migrate-groups --workspace-id=12132312323
Control Flow Overview
This entire control flow should operate with some temporary state management between actions so that operations can be continued or the control flow can be retried in the case of an error or interruption.
Persisted Data Structures
Temporarily Persisted Data Structures
hive_metastore.foo.baz:group_1
hive_metastore.foo.baz:group_2
Phase 1 - Workspace Group Migration
Check to see if the skip groups flag has been set
Check to see if the Workspace Group Migration Manifest exists
For each group in Workspace $inventory as ordered by your Group Upgrade Manifest where not migrated 1) See if the group exists in the Account console (use the flag for workspace-scoped group names to modify the name before evaluating) 2) If the group exists, and if the members are the same,
Phase 2 - Workspace Object Permission Migration
Phase 3 - Workspace Table ACL Migration