databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
445 stars 384 forks source link

Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances #951

Closed i-engy closed 2 years ago

i-engy commented 2 years ago

Hi there,

We are consistently getting this bug with databricks and after a retry it works as it should.

│ Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances

Configuration

"Initializing provider plugins...
- Finding latest version of hashicorp/time...
- Finding databrickslabs/databricks versions matching ""0.3.11""...
- Finding hashicorp/aws versions matching ""3.66.0""...
- Installing databrickslabs/databricks v0.3.11...
- Installed databrickslabs/databricks v0.3.11 (signed by a HashiCorp partner, key ID 905AA25F2E92C2D5)
- Installing hashicorp/aws v3.66.0...
- Installed hashicorp/aws v3.66.0 (signed by HashiCorp)
- Installing hashicorp/time v0.7.2...
- Installed hashicorp/time v0.7.2 (signed by HashiCorp)"

Expected Behavior

What should have happened? It should work without any issue as credentials are correct.

Actual Behavior

What actually happened? It mostly fails in first try and then in 2nd attempt it works.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Terraform and provider versions

Terraform version: 1.0.9 databricks: 0.3.11 AWS: 3.66

Debug Output

Sorry can debug the output as I don't have access to Jenkins Pipeline which is used to run code.

Important Factoids

We were experiencing same result with data bricks 0.3.10 so we upgraded to 0.3.11 and issue continued.

nfx commented 2 years ago

@i-engy what resources? What is the config? Please update to 0.4.0, which will give a bit more context to error

i-engy commented 2 years ago

@nfx thanks for the quick response. sorry I didn't specify the resource.

│   with module.databricks.databricks_mws_workspaces.this,
│   on ../../../../modules/databricks/main.tf line 77, in resource "databricks_mws_workspaces" "this":
│   77: resource "databricks_mws_workspaces" "this" {

let me try to update to 0.4.0 and see if it still happens.

nfx commented 2 years ago

And does it happen for existing workspaces or just new ones?

If it happens - can you pick the latest aws cross-account policy from the docs and also verify if it persists.

i-engy commented 2 years ago

same result with 0.4.0

"╷
│ Error: cannot create mws workspaces: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances
│ 
│   with module.databricks.databricks_mws_workspaces.this,
│   on ../../../../modules/databricks/main.tf line 77, in resource ""databricks_mws_workspaces"" ""this"":
│   77: resource ""databricks_mws_workspaces"" ""this"" {
│ 
╵"

we have a pipeline where we create/delete databricks workspaces in different envs using Jenkins. This happens during creation of workspaces. deletion part works flawlessly

nfx commented 2 years ago

@i-engy what cross-account policy do you use? Is the policy in AWS console the same as defined in https://docs.databricks.com/administration-guide/account-api/iam-role.html ? If it's the same, then create a support ticket with your policy attached.

By the way, why do you regularly need to create a workspace from CI pipeline? There's a limit of workspaces per account.

i-engy commented 2 years ago

we do create/delete same workspace with ci pipeline. its part of testing automation and application. let me share the policy of databricks used in support ticket.

i-engy commented 2 years ago

This "malformed request" issue happens on only first try. when you rerun the same pipeline with same parameters. it works.

nfx commented 2 years ago

@i-engy Then this is definitely a platform issue and not related to the provider :)

kennes913 commented 2 years ago

This issue was happening when attempting to use a customer-managed (in Databricks lingo) VPC . This did not happen when we provisioned our own VPC via Terraform code.

The creation and availability of the cross account role takes longer than the "completed" message given by Terraform. I suspect that the cross account role needs to be globally available and this is what is taking a bit longer than the API call made to create the role and attach the policy.

This solved the problem:

# in main.tf of workspace module
...
resource "time_sleep" "cross_account_role" {
  depends_on      = [aws_iam_role.cross_account_role]
  create_duration = "10s"
}

resource "databricks_mws_workspaces" "this" {
  account_id     = var.databricks_account_id
  aws_region     = var.databricks_workspace_region
  workspace_name = local.prefix
  depends_on     = [time_sleep.cross_account_role]
  credentials_id           = databricks_mws_credentials.this.credentials_id
  storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
  network_id               = databricks_mws_networks.this.network_id
}