Closed jschra closed 2 years ago
@jschra Do you have set any expiry date for the obo token (module.pm_databricks_services.service_principal_token
)
@nkvuong Yes, I do. Below you can find the snippet that runs in my services module that creates the obo token:
# Create service principal for API access from DevOps
resource "databricks_service_principal" "this" {
display_name = "DevOps automation service principal"
# No access by default, only through groups
allow_cluster_create = false
allow_instance_pool_create = false
databricks_sql_access = false
}
# Add to developers group
resource "databricks_group_member" "sp" {
group_id = databricks_group.developers.id
member_id = databricks_service_principal.this.id
}
# Generate PAT
resource "databricks_obo_token" "this" {
depends_on = [databricks_group_member.sp]
application_id = databricks_service_principal.this.application_id
comment = "PAT on behalf of ${databricks_service_principal.this.display_name}"
lifetime_seconds = 129600
}
it's not related to Git, but more to the authentication...
basically what happened is that the obo token expired every 1.5 days, so your databricks.git
provider will fail to initialise leading to the error message.
cannot read git credential: cannot configure databricks-cli auth: /Users/jschra/.databrickscfg has no DEFAULT profile configured. Attributes used: host. Please check https://registry.terraform.io/providers/databricks/databricks/latest/docs#authentication for details
@nkvuong, ok lol, that was really stupid on my side. Apologies for that.
Question is, however, how to make such a configuration robust for obo tokens that expire. Now it's (accidentally) set to 1,5 days, I'd probably set it to 90. But then again, I would have this same problem after 90 days.
Any ideas on how I can ensure that the token is recreated before it expires? Otherwise I will eventually always end up in this situation (given that I want to keep all these configurations in one Terraform apply run)
@jschra there is no easy way to do this in just a single Terraform apply. It is fundamental to how Terraform works.
The key issue here is that providers need to be instantiated for all operations, not just apply. In theory, Terraform apply will succeed (because it will handle the dependency correctly, i.e. generate the obo token, supply that to the provider, then read the git credential), but Terraform plan will fail (because there is no token for the databricks.git
provider)
My suggestion would be to split this into 2 separate configurations, using the output of the first one as input for the 2nd one (via secret manager for example). Your apply script needs to then run 2 Terraform apply sequentially, but hopefully it's not too much work
Yeah makes sense. I reckon this would only work if the obo_token would have a parameter that would enforce it to be re-created before it expires by a said amount of time. Say if you run with a token for 90 days, force it to be replaced after 85 days. With a pipeline that runs the TF configurations daily, that would solve the problem.
But that's more of a feature request anyways. Thanks a lot for looking a long and pulling out this very clumsy mistake of mine!
@jschra you could try combining time_rotating with replace_triggered_by, so that the token is replaced after 85 days
That’s a great idea @nkvuong ! Will give it a try tomorrow, will keep you posted. Cheers!
@nkvuong tried adding logic using the resources you mentioned and it does allow me to replace the obo_token before it expires. If I use said token in a provider to subsequently enter the workspace, however, it still bugs out. Apparently plan detects the lifecycle rule that replaces the token, resulting in an empty token at the time of plan, due to which the plan fails as it is unable to login to the workspace using the token.
This is what my config looks like right now:
# Create service principal for API access from DevOps
resource "databricks_service_principal" "this" {
provider = databricks.test
display_name = "test PAT"
# No access by default, only through groups
allow_cluster_create = false
allow_instance_pool_create = false
databricks_sql_access = false
}
# Create rotation time object of a minute
resource "time_rotating" "example" {
rotation_minutes = 1
}
resource "random_id" "test" {
keepers = {
time_rotating = time_rotating.example.id
}
byte_length = 8
}
# Generate PAT
resource "databricks_obo_token" "this" {
provider = databricks.test
application_id = databricks_service_principal.this.application_id
comment = "PAT on behalf of ${databricks_service_principal.this.display_name}"
lifetime_seconds = 60000
lifecycle {
replace_triggered_by = [
random_id.test.hex
]
}
}
Works perfectly fine if I do not have a provider based on the token, but if I do then I again get the following error:
I guess I'll run with your advice to split the configs in two, as that makes all of this significantly easier. Thanks again for taking the time to take a look and think along, even though the initial issue was invalid!
@jschra this would work if you split the configuration up, and run terraform apply sequentially, as the new token will be available for the provider in the second configuration to pick up
Hi there,
In my configurations, I am sequentially building a workspace to thereafter enter it and deploy services within it. In doing so, I also create a service principal for which I generate an obo token that I want to store separately for automation services.
In the last step of my configuration, I take this obo token and the host url of my workspace to enter it a second time but now as the service principal in order to store git credentials there. Any subsequent pipelines can then enter my workspace and start pulling Git repositories, without having to worry about this.
Now when I initially do this, it works fine. I can store the git credentials, I can pull a repo and then call it a day. If I run an additional plan or apply right after, it also still works.
When I try to rerun my configurations the next day, however, it no longer seems to pick up on the Terraform provider I configured. I get the following error when try to run my configs locally:
and the following error when my DevOps pipeline tries to run it:
Apparently, it no longer picks up on the databricks.git provider I pass to the resource and instead starts to try and look-up credentials in other places, where it then fails.
My question is: how? In my configurations I pass on the host url and obo_token to a separately generated databricks provider, which I then explicitly use in the resource block for the git credentials and the git repo. More on that below.
Configuration
Expected Behavior
I expect that Terraform plan/apply runs without any issues.
Actual Behavior
Terraform plan/apply stop due to an error, stating that it cannot retrieve the databricks_git_credentials. It does work right after running a successfull apply, but it does not if a day or so passes.
Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
Terraform and provider versions
Debug Output
https://gist.github.com/jschra/d9e958460193b1c20ea644bb036f9668
Important Factoids
None