databrickslabs / ucx

Automated migrations to Unity Catalog
Other
235 stars 80 forks source link

Add support for init scripts in crawling for Azure Service Principals #415

Closed zpappa closed 4 months ago

zpappa commented 1 year ago

326

Background

Run a dependent job after the current jobs to capture the details from init scripts and if any matching spark config for Azure is found then append to the cluster, job and Azure SPN tables.

Add the following

related info:

%sh
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant id>/oauth2/v2.0/token \
-d 'client_id=<application id of the service principal>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client secret of the service pincipal>'
pohlposition commented 1 year ago

Relates to https://github.com/databrickslabs/ucx/issues/413

nfx commented 1 year ago

Seems like a duplicate of #413

nfx commented 1 year ago

It is impossible to do with the resources we have

nfx commented 1 year ago

As part of https://github.com/databrickslabs/ucx/pull/326 the following are taken care of -

Scanned spark config all clusters, jobs, cluster policies, pipelines for Azure Service Principals who has access to storage and flagged Scanned cluster scoped and global init scripts for Azure Service Principals who has access to storage and flagged In this issue the following pending item is meant to be taken care of -

Create an inventory of all Azure SPNs who has access to storage from all the init scripts (cluster and global) and add it to the "azure_service_principals" table in HMS.

Related to https://github.com/databrickslabs/ucx/issues/249

nfx commented 4 months ago

we crawl principal permissions directly on storage accounts. we won't parse shell scripts, which is prohibitively expensive