DataBiosphere / terra-notebook-utils

Utilities for the Terra notebook environment.
MIT License
7 stars 6 forks source link

Reliable identification of primary cloud #403

Open mbaumann-broad opened 1 year ago

mbaumann-broad commented 1 year ago

Objective

TNU now supports both Terra GCP and Terra Azure, yet a more reliable mechanism is needed to identify which should be used.

Background

When running in a Terra GCP or Terra Azure Interactive Analysis Cloud Environment, TNU must be able to reliably identify which cloud it is running in so it can auth correctly for access to Terra services. An initial implementation was provided in PR #401 based on environment variables such as WORKSPACE_BUCKET. Yet, the environment variables that will be available in Terra Azure Cloud Environments have not been fully identified/implemented yet (IA-3597). When these are available, the code TNU uses to identify the cloud/auth-system to use must be revisited.

If the value of a storage-related environment variable is checked to see which cloud platform it pertains to, it should not just check for 'https://` to identify Azure, instead it should perform a more detailed check using a regular expression, such as:

re.search("^https://.*\.(dfs|blob)\.core\.windows\.net$", workspace_bucket)

Additionally, TNU needs a way to reliablty identify which cloud to use when TNU is running outside of Terra, for example on a local/institutional system.