SAP / terraform-provider-btp

Terraform provider for SAP BTP
https://registry.terraform.io/providers/SAP/btp/latest
Apache License 2.0
87 stars 18 forks source link

[FEATURE] Add Retry Logic to resources to avoid rate limiting issues #872

Closed lechnerc77 closed 3 months ago

lechnerc77 commented 3 months ago

What area do you want to see improved?

terraform provider

Is your feature request related to a problem? Please describe.

Currently there might be issues with a rate limiting when running Terraform configurations. One example is the resource subaccount_entitlement that runs into such an an error:

21: resource "btp_subaccount_entitlement" "entitlement" {

Request rate limit exceeded [Error: 11006/429]

This error is documented for the API see https://api.sap.com/api/APIEntitlementsService/path/setServicePlans -> HTTP code 429

Describe the solution you would like

Resources that allow a clear identification of an error due to a rate limit, should support a retry logic in analogy to the timeout block that is already available on some resources.

This retry logic was part of the provider SDK (see https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2@v2.34.0/helper/retry) but was not migrated to the Plugin Framework (see https://discuss.hashicorp.com/t/terraform-plugin-framework-what-is-the-replacement-for-waitforstate-or-retrycontext/45538)

As the provider already copied the timeout functionality it makes sense to also add the retry functionality to the resources where this is needed based on customer requests and that allow to identify an error as rate limit (and hence as retry-able. The functionality is available in this go package https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource#RetryContext.

Describe alternatives you have considered

From a consumer perspective there is not easy way to retry this except for re-scheduling a CI/CD in case of such errors which is quite cumbersome.

Additional context

Potential resources that might be subject to rate limits (HTTP code 429):

The information when a retry makes sense is given via the header field X-Ratelimit-Reset -> needs to be checked if this is returned by BTP CLI server.

Rate limiting might also hit when executing a data source (READ operations) at least the API provides the same HTTP code. Needs to be validated if this also needs to be added to the READ operation

github-actions[bot] commented 3 months ago

Thanks for the feature request. We evaluate it and update the issue accordingly.

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

lechnerc77 commented 3 months ago

How to reproduce a timeout:

variable "entitlements" {
  type        = map(list(string))
  description = "Map with all expected entitlements"
  default = {
    # Alert Notification Service
    "alert-notification" = ["standard"]

    # Application Logging Service
    "application-logs" = ["standard=1"]

    # Audit Log
    "auditlog" = ["oauth2=1"]

    # Cloud Management Service
    "cis" = ["system-basic", "xrs"]

    # Connectivity
    "connectivity" = ["connectivity_proxy"]

    # Custom Domain Service
    "INFRA"                 = ["custom_domains=2"] # Custom Domains Service
    "custom-domain-manager" = ["standard=1"]       # Application

    # Job Scheduling Service
    "jobscheduler" = ["standard=1"]

    # SAP HANA Cloud
    "hana-cloud"       = ["hana"]
    "hana-cloud-tools" = ["tools"]

    # SAP HANA Schemas & HDI Containers
    "hana" = ["schema"]

    # Service Manager
    "service-manager" = ["container", "subaccount-audit", "subaccount-admin"]

    # SAP Credential Store Service
    "credstore" = ["small=1"]
  }
}