codecutout / terraform-provider-powerbi

Terraform Power BI Provider
MIT License
14 stars 16 forks source link

Intermittent failure while calling Power BI REST API during Terraform deployment of Power BI resources #44

Open anza19 opened 3 years ago

anza19 commented 3 years ago

Hello,

For the past several weeks we have been facing an issue in our Terraform deployment pipeline that while calling the PowerBI end point for any resources, be it groups, users who have access to a group, reports etc, the deployment will fail with a generic error message displayed as below:

**2021-06-24T21:52:07.4142393Z ##[error]Error: Get "https://api.powerbi.com/v1.0/myorg/groups?%24filter=id+eq+%1111111111-1111-1111-1111-111111111111%11": status code '500 Internal Server Error' with body {"Message":"An error has occurred."}

2021-06-24T21:52:07.4340029Z ##[error]

2021-06-24T21:52:07.4448634Z ##[error]**

We have not been able to diagnose the root cause as the failure is very intermittent, sometimes it'll fail, sometimes it'll succeed and the deployment finishes successfully.

The way our deployment works is basically we have a concept of a namespace (akin to a pod almost), within which resides the powerbi resources for that specific namespace, during deployment we deploy several of them at once, we have see the more namespaces we are deploying the higher likelihood we have of getting this 500 internal server error while calling PowerBI. This error happens both for refreshing existing resources or creating new resources.

In our configuration file we create the resources that we want the powerbi provider to manage:

resource "time_sleep" "wait_90_seconds" {
  depends_on = [azuread_service_principal.pbi]
  create_duration = "90s"
}
resource "powerbi_workspace" "pbi-workspace" {
  name           = "pbi-workspace"
  capacity_id  = var.capacityId
}
resource "powerbi_workspace_access" "pbi-workspace-app-access" {
  workspace_id            = powerbi_workspace.pbi-workspace.id
  group_user_access_right = "Admin"
  principal_type          = "App"
  identifier              = azuread_service_principal.pbi.id
  depends_on              = [time_sleep.wait_90_seconds]
}
resource "powerbi_workspace_access" "group-workspace-admin-access" {
  workspace_id            = powerbi_workspace.pbi-workspace.id
  group_user_access_right = "Admin"
  principal_type          = "Group"
  identifier                     = data.azuread_group.power_bi_workspace_aad_group_access.id
  depends_on              = [time_sleep.wait_90_seconds]
}
resource "powerbi_pbix" "new-report" {
  workspace_id = powerbi_workspace.pbi-workspace.id
  name         = "New Report"
  source       = "./Reports/New Report.pbix"
}
alxy commented 3 years ago

I think that is an issue with the PBI REST api itself. We had the same experience when running a lot of Powershell scripts to automate PowerBI deployments - sometimes the commands randomly fail with server errors.

alex-davies commented 3 years ago

Its unfortunate, but the powerbi services are not very reliable. It has become noticeably worse in the last few months and is often a struggle to get all the tests to pass in a single go without an occasional intermittent 500

It would be possible to add a simple retry mechanism, and will probably be forced to do so in the next PR just so I can get a clean pass of the integration tests

Gi1gamesh commented 3 years ago

@alex-davies Thanks, this impacts me as well and adding an exponential backoff to retry on these failing requests would be a very nice feature.