Azure / terraform-provider-azapi

Terraform provider for Azure Resource Manager Rest API
https://registry.terraform.io/providers/Azure/azapi/latest
Mozilla Public License 2.0
185 stars 48 forks source link

Infinite loop with Get response after Put - context deadline exceeded #163

Open mlomat opened 2 years ago

mlomat commented 2 years ago

Hello guys, Below my code, which I am trying to trigger SyncJob with Git for Automation Account. First part is working without any issue, it's adding repo to sync with in Azure. The second part which is triggering sync with Git is also working, because it's calling the correct API and I see that it's syncing with Git on Azure Portal, but it's going into infinite loop ( I see inside logs that it's getting correct json in response for get ) and throwing error: "context deadline exceeded". No more details which looks useful :(

locals {
  repoUrl = "urlToGitRepositoryOnAzureDevOps"
  branch = "development"
  FolderPath = "PathToFolderWithPowershelLScripts"
  PatToken = "PatTokenHERE"
}

resource "azapi_resource" "automationAccount" {
  type      = "Microsoft.Automation/automationAccounts/sourceControls@2020-01-13-preview"
  name      = "Azure DevOps"
  parent_id = var.automation_account_id

  body = jsonencode({
    properties = {
      repoUrl = local.repoUrl
      sourceType = "VsoGit"
      branch = local.branch
      folderPath = local.FolderPath
      autoSync    = true
      publishRunbook = true
      securityToken = {
        accessToken = local.PatToken
        tokenType = "PersonalAccessToken"
      }
    }
  })
}

resource "time_sleep" "wait_20_seconds" {
  depends_on = [azapi_resource.automationAccount]

  create_duration = "20s"
}

resource "azapi_resource" "automation_account_repo_sync" {
  type      = "Microsoft.Automation/automationAccounts/sourceControls/sourceControlSyncJobs@2020-01-13-preview"
  name      = uuid()
  parent_id = azapi_resource.automationAccount.id

  body = jsonencode({
    properties = {
      commitId = ""
    }
  })

  ignore_casing = true
  ignore_missing_property = true

  response_export_values = [ "properties.provisioningState", "properties.sourceControlSyncJobId" ]

  depends_on = [
    time_sleep.wait_20_seconds
  ]
}

output "status" {
  value = jsondecode(azapi_resource.automation_account_repo_sync.output).properties.provisioningState
}

Any idea how to fix it?

Regards Mateusz

mlomat commented 2 years ago

Here is last response for Get which was correct one with error. I just remove subscriptionId

{"id":"/subscriptions/{subscriptionWipeOut}/resourceGroups/org-canary-mgmt/providers/Microsoft.Automation/automationAccounts/org-canary-automation/sourceControls/Azure%20DevOps/sourceControlSyncJobs/7cc544ac-9cef-f4f5-ca55-4a9b498bb73b","properties":{"sourceControlSyncJobId":"7cc544ac-9cef-f4f5-ca55-4a9b498bb73b","provisioningState":"Completed","exception":null,"creationTime":"2022-08-02T09:10:37.2006798+00:00","startTime":"2022-08-02T09:10:50.5761283+00:00","endTime":"2022-08-02T09:11:15.9201743+00:00","syncType":"FullSync"}}
   --------------------------------------------------------------------------------: timestamp=2022-08-02T11:40:32.790+0200
2022-08-02T11:40:32.791+0200 [DEBUG] provider.terraform-provider-azapi_v0.4.0.exe: Aug  2 11:40:32.790484 Retry: response 200: timestamp=2022-08-02T11:40:32.790+0200
2022-08-02T11:40:32.791+0200 [DEBUG] provider.terraform-provider-azapi_v0.4.0.exe: Aug  2 11:40:32.790484 LongRunningOperation: State Completed: timestamp=2022-08-02T11:40:32.790+0200
2022-08-02T11:40:32.791+0200 [DEBUG] provider.terraform-provider-azapi_v0.4.0.exe: Aug  2 11:40:32.790484 LongRunningOperation: delay for 10s: timestamp=2022-08-02T11:40:32.790+0200
2022-08-02T11:40:33.702+0200 [DEBUG] provider.terraform-provider-azapi_v0.4.0.exe: Aug  2 11:40:33.702272 LongRunningOperation: END PollUntilDone() for *loc.Poller[interface {}]: context deadline exceeded, total time: 29m55.6327484s: timestamp=2022-08-02T11:40:33.702+0200
2022-08-02T11:40:33.706+0200 [TRACE] maybeTainted: module.Automation_Account_setup.azapi_resource.automation_account_repo_sync encountered an error during creation, so it is now marked as tainted
2022-08-02T11:40:33.706+0200 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for module.Automation_Account_setup.azapi_resource.automation_account_repo_sync
2022-08-02T11:40:33.706+0200 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for module.Automation_Account_setup.azapi_resource.automation_account_repo_sync
2022-08-02T11:40:33.706+0200 [TRACE] evalApplyProvisioners: module.Automation_Account_setup.azapi_resource.automation_account_repo_sync is tainted, so skipping provisioning
2022-08-02T11:40:33.706+0200 [TRACE] maybeTainted: module.Automation_Account_setup.azapi_resource.automation_account_repo_sync was already tainted, so nothing to do
2022-08-02T11:40:33.706+0200 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for module.Automation_Account_setup.azapi_resource.automation_account_repo_sync
2022-08-02T11:40:33.706+0200 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for module.Automation_Account_setup.azapi_resource.automation_account_repo_sync
2022-08-02T11:40:33.709+0200 [ERROR] vertex "module.Automation_Account_setup.azapi_resource.automation_account_repo_sync" error: creating/updating "Resource: (ResourceId \"/subscriptions/{subscriptionWipeOut}/resourceGroups/org-canary-mgmt/providers/Microsoft.Automation/automationAccounts/org-canary-automation/sourceControls/Azure DevOps/sourceControlSyncJobs/7cc544ac-9cef-f4f5-ca55-4a9b498bb73b\" / Api Version \"2020-01-13-preview\")": context deadline exceeded
2022-08-02T11:40:33.709+0200 [TRACE] vertex "module.Automation_Account_setup.azapi_resource.automation_account_repo_sync": visit complete, with errors
2022-08-02T11:40:33.709+0200 [TRACE] dag/walk: upstream of "module.Automation_Account_setup (close)" errored, so skipping
2022-08-02T11:40:33.709+0200 [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/azure/azapi\"].management (close)" errored, so skipping
2022-08-02T11:40:33.709+0200 [TRACE] dag/walk: upstream of "root" errored, so skipping
2022-08-02T11:40:33.805+0200 [DEBUG] Azure Backend Request:
ms-henglu commented 2 years ago

Hi @mlomat ,

Thank you for opening this issue! And the logs are really helpful.

I think the cause is that "provisioningState":"Completed" and Completed is not a terminal state from https://github.com/Azure/azure-sdk-for-go/blob/24bcaf5863e68afbb4f0b17fc72dcde5c47a3e99/sdk/azcore/internal/pollers/util.go#L34 , so it keeps polling until exceeding the time limit.

I'll confirm with service team about this API design.

mlomat commented 2 years ago

Hi @ms-henglu,

Thanks for quick replay. Is it possible to create a feature to also extend for other responses or some other workaround for it?

Best Mateusz

ms-henglu commented 2 years ago

@mlomat - I've created an issue to track the API bug: https://github.com/Azure/azure-rest-api-specs/issues/20085

Unfortunately, it's difficult to provide a workaround/fix for this case.

mlomat commented 2 years ago

Quick workaround with local-exec provisioner and using az rest command. It's not perfect... but at least is not throwing errors ;) Below code is full solution to setup sync with Repo + trigger sync. If you need only to trigger sync, just pass correct id of repo resource and replace azapi_resource.automationAccount.id iniside cutom provisioner.

resource "azapi_resource" "automationAccount" {
  type      = "Microsoft.Automation/automationAccounts/sourceControls@2020-01-13-preview"
  name      = "Azure DevOps"
  parent_id = var.automation_account_id

  body = jsonencode({
    properties = {
      repoUrl        = local.repoUrl
      sourceType     = "VsoGit"
      branch         = local.branch
      folderPath     = local.FolderPath
      autoSync       = true
      publishRunbook = true
      securityToken = {
        accessToken = local.pat_token
        tokenType   = "PersonalAccessToken"
      }
    }
  })

  response_export_values = ["*"]
}

resource "time_sleep" "wait_20_seconds" {
  triggers = {
    "automationAccount" = "${azapi_resource.automationAccount.output}"
  }
  depends_on = [azapi_resource.automationAccount]

  create_duration = "20s"
}

resource "null_resource" "trigger_sync_runbooks" {
  triggers = {
    automationAccount = "${azapi_resource.automationAccount.output}"
  }

  provisioner "local-exec" {
    interpreter = ["bash", "-c"]
    command     =<<EOT
automationAccountIdSourceControl=$(echo ${azapi_resource.automationAccount.id});
uuid=$(cat /proc/sys/kernel/random/uuid);
url="https://management.azure.com$automationAccountIdSourceControl/SourceControlSyncJobs/$uuid?api-version=2017-05-15-preview"
az rest --method PUT --url "$url" --body "{'properties':{'commitId':''}}" --headers "{'content-type': 'application/json; charset=utf-8'}"

response=null
while [[ $response != Failed && $response != Completed ]]; 
  do echo 'Waiting for response!';
  response=$(az rest --method GET --url "https://management.azure.com/$automationAccountIdSourceControl/SourceControlSyncJobs/$uuid?api-version=2017-05-15-preview" | jq .properties.provisioningState -r); 
  sleep 4s; 
  done

if [ $reponse != Completed ]
    then
      echo "$response"
      az rest --method GET --url "https://management.azure.com/$automationAccountIdSourceControl/SourceControlSyncJobs/$uuid?api-version=2017-05-15-preview"
      exit 1
fi
EOT
}

  depends_on = [
    time_sleep.wait_20_seconds
  ]
}
Nacymus commented 1 year ago

I've got quite the same problem with the RedHat OpenShift Provider. Cluster Creation succeeds but the Terraform Provider keeps polling for the "provisioningState" . And Although the latter is 'succeeded' and nothing like "completed" or so, Terrfaorm provider stops polling and fails to register the created ARO cluster in the terraform state.

ms-henglu commented 1 year ago

Hi @Nacymus , would you please share the configuration? Thanks!

Nacymus commented 1 year ago

Hello @ms-henglu

I am on a private repository. Can I invite you as a collaborator to the repo and test the config yourself. There is a very helpful README.md.

Nacymus commented 1 year ago

Hi @ms-henglu ,

Now you are a collaborator on the project. I have sent you the invitation. I tried to investigate on the problem on my own, so if ever you need information on what I think might have caused timeout, I will be available.

montgomery-plattner-kh commented 7 months ago

@ms-henglu - Can you please provide an update on this issue? I am trying to use the azapi_resource resource for adding an App Service Certificate to a Key Vault and am getting stuck in an infinite loop of GET requests. The initial PUT request returns a 201 and I can see that the certificate is added to the key vault but the terraform apply doesn't ever finish successfully since the azapi_resource times out after 30 minutes.