databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
445 stars 383 forks source link

[DOC] Run Databricks Azure Git Job as Service Principal not possible #2707

Open minico-dev opened 12 months ago

minico-dev commented 12 months ago

Affected Resource(s)

databricks_job databricks_git_credential

Expected Details

I have used Terraform to create a Databricks Job in my workspace. Without explicitly specifying the run_as block in the job specification, the job is run by the Service Principal that was used for creating the job through Terraform. It is also possible to explicitly specify a Service Principal for the run_as parameter. However, there seems to be no way for such an account to obtain an Azure DevOps PAT to use in their AzureDevOpsServices git_credentials. It is only possible for them to create an Azure AD token (see included Factoids below). This token usually has a short lifetime and will not work as a static token in git credentials, because it would require a new token for every interaction with the repo. It is therefore not possible for a Service Principal to run any job that includes running code sourced from a Azure DevOps Git Repository. The job will fail with an error that it does not have permission to checkout the Git repository. This limitation is not mentioned anywhere in either the databricks_job or databricks_git_credential resources.

List of things to potentially add/remove:

Important Factoids

This Microsoft acticle specifies that Service principals can't create tokens, like personal access tokens (PATs) or SSH Keys. They can generate their own Azure AD tokens and these tokens can be used to call Azure DevOps REST APIs. (located just above the FAQ section). The same article also includes a question Q: Can I use a service principal to do git operations, like clone a repo? to which the answer is to generate a (short lifetime) Azure AD token for git operations.

alexott commented 12 months ago

it's possible with PAT, for example.

It's really not a terraform issue, but a product - we can't document every specific limitation in the terraform docs

minico-dev commented 12 months ago

The referenced solution still relies on generating a PAT token. As the Microsoft documentation mentions, this is not possible for Service Principals. The PAT would need to be a token manually generated by on a user account. The job would be ran as the SP in Databricks, but it would still be depending on a user's PAT to checkout the repository.

I agree that this is a product issue, but it I think it would be a nice addition to the documentation as it is not mentioned anywhere.

benwhelankf commented 11 months ago

In AWS, the way we got around this is:

provider "databricks" {
  alias         = "job_sp"
  host          = var.databricks_workspace_host
  client_id     = databricks_service_principal.job_sp.application_id
  client_secret = databricks_service_principal_secret.sp_secret.secret
  account_id    = var.databricks_account_id
}
resource "databricks_git_credential" "sp_git_credential" {
  provider              = databricks.job_sp
  ....

  depends_on = [databricks_service_principal_secret.sp_secret]
}

This then effectively gives the SP git credentials access to the repo. But it feels very not nice, so ideally we'd have a way of doing this without having to call the API/provider with the SP credentials.

cran1um commented 11 months ago

2190

philippbussche commented 2 days ago

@benwhelankf this sounds like an interesting workaround. Our git provider over here is azureDevOpsServices and the Service Principal we want to use for Jobs is Databricks managed. I tend to say we could not have a user in Azure DevOps representing this SP so that with the respective databricks_git_credential resource we would be able to perform a succesful authentication. What is your git provider if I may ask and have you created a user in it to represent the Databricks SP ?