PagerDuty / terraform-provider-pagerduty

Terraform PagerDuty provider
https://www.terraform.io/docs/providers/pagerduty/
Mozilla Public License 2.0
206 stars 210 forks source link

HTTP 429 Rate limit is no longer handled gracefully #824

Closed mflis-sumo closed 6 months ago

mflis-sumo commented 7 months ago

Terraform Version

Terraform v1.5.4

Affected Resource(s)

Expected Behavior

PagerDuty should gracefully handle the HTTP 429 response from PD API by waiting some time and retrying the request.

Actual Behavior

After upgrading the PD provider version to 3.8.1 I've noticed that TF plan and apply started receiving errors like this one:

Error: Cannot obtain plugin client
  with provider["registry.terraform.io/pagerduty/pagerduty"],
  on provider.tf line 39, in provider "pagerduty":
  39: provider "pagerduty" {
HTTP response failed with status code 429, message: Rate Limit Exceeded
(code: 2020)

Steps to Reproduce

Either try to run many requests to PD API in a short time to exhaust your rate limit or prepare mitm script to reproduce HTTP 429 response. Below I'll describe the approach with mitmproxy as it's deterministic.

  1. Prepare any Terraform configuration that contains PagerDuty resources
  2. Install mitmproxy and setup certificates: https://docs.mitmproxy.org/stable/concepts-certificates/#installing-the-mitmproxy-ca-certificate-manually
  3. Create a script that will return a single HTTP 429 response
    
    from mitmproxy import http

class RateLimitResponse: def init(self): self.counter = 0

def response(self, flow):
    if "https://api.pagerduty.com/abilities" == flow.request.url and self.counter ==0 :
        flow.response.status_code = 429
        self.counter = self.counter +1

addons = [RateLimitResponse()]

4. Run mitm: ` mitmweb -s script.py`
5. Run terrafrom plan with debug logs and http proxy: ` TF_LOG_PROVIDER=debug HTTP_PROXY=http://localhost:8080   HTTPS_PROXY=https://localhost:8080 terrafrom plan`
6. When using Pagerduty provider version 3.6.0 I can see this message in logs and plan is successful:

2024-02-23T14:27:27.471+0100 [INFO] provider.terraform-provider-pagerduty_v3.6.0: Rate limit hit, throttling by 33.6 seconds until next retry to GET: https://api.pagerduty.com/abilities: timestamp=2024-02-23T14:27:27.471+0100


7. When using Pagerduty provider version 3.8.1 `terraform plan` fails immediately without trying to retry request to PD API.

### Important Factoids
I guess that this bug was introduced in the 3.7.0 release, when the HTTP client was changed: https://github.com/PagerDuty/terraform-provider-pagerduty/pull/787
viktor-f3 commented 7 months ago

This is a major problem as every version past 3.7.0 is not usable for any organisation that manages a few thousand resources via Terraform.

simonknittel commented 5 months ago

I'm still getting rate limited with the uptimerobot_monitor resources. I'm having 5 of these resources in my Terraform config. When I do two runs after another, the second one will pretty much always get rate limited.

In the logs I can see that this request is getting 429 responses: POST https://api.uptimerobot.com/v2/getMonitors

My setup: Terraform version: 1.8.0 PagerDuty Provider version: 3.11.3