databricks-industry-solutions / security-analysis-tool

Security Analysis Tool (SAT) analyzes customer's Databricks account and workspace security configurations and provides recommendations that help them follow Databrick's security best practices. When a customer runs SAT, it will compare their workspace configurations against a set of security best practices and delivers a report.
Other
85 stars 39 forks source link

Setting up Security Analysis Tool (SAT) in the environment with Private VPC, no public internet, accessing Account APIs through PrivateLink #97

Closed chennsud-flutterint closed 5 months ago

chennsud-flutterint commented 5 months ago

Hi team,

We are encountering difficulties while setting up SAT in our development environment with a VPC that restricts public internet access. Here's a summary of our configuration: • Cloud Provider: AWS • Authentication: Service Principal • Network: VPC with Frontend/Backend PrivateLinks • Library Installation: Offline Wheel packages deployed directly to the cluster • Secrets: Managed via Databricks CLI • Database: Unity Catalog • Cluster: Single user for running SAT notebooks

Issues:

  1. Account Connection Timeout: During the security_analysis_intializer notebook execution, the final cell (referencing security-analysis-tool/notebooks/Setup/1. list_account_workspaces_to_conf_file) encounters a connection timeout error when attempting to connect to accounts.cloud.databricks.com for account-level setup. The specific error is: HTTPSConnectionPool(host='accounts.cloud.databricks.com', port=443): Max retries exceeded with url: /oidc/accounts//v1/token (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f90fcef31c0>, 'Connection to accounts.cloud.databricks.com timed out
  2. When skipping account-level setup, subsequent steps in the initializer fail due to the absence of global temporary views, which are typically created during the first setup process.

Attempted Solution: We tried utilizing the PrivateLink VPC endpoint with its DNS for account-level API calls using a curl command(ignoring SSL certification), but it didn't return a token or response. With the SSL it failed to retrieve the SSL certificate.

Here's the command for reference: curl --request POST -k \ -d '{ "vpc_endpoint_name": "Databricks backend endpoint", "region": "eu-west-1", "aws_vpc_endpoint_id": "<>" }' \ --url https://ireland.privatelink.cloud.databricks.com/oidc/accounts//v1/token \ --user "$CLIENT_ID:$CLIENT_SECRET" \ --data 'grant_type=client_credentials&scope=all-apis'

We appreciate your help in giving us the instructions how to call the account APIs through PriavteLink without the public internet if possible at all. Please assess on your end if we can achieve this.

arunpamulapati commented 5 months ago

Please make sure your own workspace can reach your accounts and workspace APIs as documented in the item 3 here: https://github.com/databricks-industry-solutions/security-analysis-tool/blob/main/docs/setup.md#troubleshooting