databrickslabs / ucx

Automated migrations to Unity Catalog
Other
227 stars 79 forks source link

[FEATURE]: Identify owners for inventory types that we track the history of #2761

Open asnare opened 1 month ago

asnare commented 1 month ago

Is there an existing issue for this?

Problem statement

We will shortly be tracking history for inventory types that are routinely refreshed during migration. The history journal that we maintain will require that for each record an owner is available: the owner is the person (or group) responsible for the underlying resource being migrated. If this is not available, the workspace administrator should be used instead.

Proposed Solution

Each crawler that is responsible for a refreshable class will need to be updated to have code that can identify the owner for its Result type.

Documentation

Additional Context

Related issues:

Blocks:

asnare commented 3 weeks ago

We need to track owners for the following inventory types:

Where the workspace admin is needed (because a more appropriate owner cannot be determined) the algorithm is as follows:

  1. Query all active workspace admins, sort alphabetically by user-name, use the first one.
  2. If there are no workspace admins, query all active account admins associated with the workspace, sort alphabetically by user-name, use the first one.
  3. Raise an error if we still don't have an identity. (Note: this is possible, especially due to accounts being decommissioned and leaving a workspace without an active admin.)

[^1]: This is not yet available on the DirectFsAccess instances; a schema change will probably be required along with code to expose this information. [^2]: The APIs for DBFS and Workspace paths don't expose the owner/creator information, so this information is unavailable. If it were available, this would first be exposed via the .owner attribute of our pathlib emulation.