databrickslabs / ucx

Automated migrations to Unity Catalog
Other
232 stars 80 forks source link

[TECH DEBT]: Merge functionality for running ucx accross workspaces #2507

Open JCZuurmond opened 2 months ago

JCZuurmond commented 2 months ago

Is there an existing issue for this?

Problem statement

Currently we are adding and have added functionality to run ucx accross workspaces, see this issue for an overview. The functionality to do so became somewhat dispersed and repeated:

Proposed Solution

TBD. First suggestion below

Context

We started with an account installer for installing ucx across workspaces and a sync workspace info command to have the workspace name available in all workspaces (as account admin is required to fetch this information). Now, we have and plan for more commands to run across a collection of workspaces.

Preferably we minimize the need for account admin, though, if you run a command over a collection of workspace you require to be admin to each of those workspaces.

Approach

  1. Move the logic for retrieving multiple workspace clients and contexts to the AccountWorkspaces class.
  2. Then we can access those via the AccountContext that has the AccountWorkspaces as (cached) attribute
  3. We access it in the account intstaller or any of the cli commands

Additional Context

No response

JCZuurmond commented 2 months ago

Note that we also use AccountWorkspaces in the AccountAggregate class

JCZuurmond commented 2 months ago

Furthermore, for the cli command, the AccountClient is initialized via a new code path when introducing the get_contexts. Before we initialized it (implicitly) by setting the is_account flag to True in the cli command decorator. Now, we also initialize it in the get_contexts.

To merge the code paths, we might require a change to blueprint, for example

  1. Make the is_account a flag optionally set by the user: databricks labs <project> <command> --on-account True
  2. Or, (try to) set the WorkspaceClient also when the is_account flag is True. If we can initiliaze the WorkspaceClient we set it to None

That would allow us to try to use the AccountClient to get a list of workpaces (contexts/clients), otherwise fallback on the WorkspaceClient provided by the blueprint command

HariGS-DB commented 2 months ago

We initially submitted a PR in blueprint to enable collection, where we set is_collection flag, so that if is_collection is set, then blueprint will set up both accountclient and workspaceclient. We agreed for now to do this implementation only in UCX https://github.com/databrickslabs/blueprint/pull/113

HariGS-DB commented 2 months ago

for cmd which are run as collection the user need to be both account admin and workspace admin of all the collection workspaces. I agree that we should move the collection logic from account installer to account workspaces and account installer should only contain options to install ucx for whole account.

JCZuurmond commented 2 months ago

I tested the follow integration test:

# tests/integration/install/test_installation.py
def test_account_installer_returns_workspace_contexts(env_or_skip, installation_ctx):
    prompts = MockPrompts(
        {
            r"Please provide the Databricks account id.*": env_or_skip("DATABRICKS_ACCOUNT_ID"),
        }
    )
    account_installer = installation_ctx.account_installer.replace(prompts=prompts)

    installation_ctx.workspace_installation.run()
    workspace_contexts = account_installer.get_workspace_contexts(
        installation_ctx.workspace_client,
        run_as_collection=True,
    )

    assert len(workspace_contexts) > 0

It fails when getting the installed workspace ids installer.config.installed_workspace_ids as the current setup tries to find an installed ucx version under the Applications folder instead of the ad-hoc installation used in integration tests

JCZuurmond commented 1 month ago

Linked a create-uber-principal related PR to log that we might want to create the uber principal at account level so that we have one principal for all workspaces