PagerDuty / terraform-provider-pagerduty

Terraform PagerDuty provider
https://www.terraform.io/docs/providers/pagerduty/
Mozilla Public License 2.0
206 stars 210 forks source link

`403 Forbidden` error when updating existing `pagerduty_escalation_policy`'s target #819

Closed oponomarov-tu closed 5 months ago

oponomarov-tu commented 7 months ago

We are experiencing 403 Forbidden error when attempting to modify existing pagerduty_escalation_policy's target.

Terraform Version

Terraform v1.6.6
on darwin_arm64
+ provider registry.terraform.io/pagerduty/pagerduty v3.7.1

Affected Resource(s)

Terraform Configuration Files

resource "pagerduty_escalation_policy" "team_oncall_escalation_policy" {
  name      = "${var.team_name} Escalation Policy"
  num_loops = 9

  dynamic "rule" {
    for_each = var.team_oncall_enabled == true ? [1] : []
    content {
      escalation_delay_in_minutes = 5
      target {
        type = "schedule_reference"
        id   = pagerduty_schedule.team_oncall_schedule_primary.id
      }
    }
  }

  dynamic "rule" {
    for_each = var.team_oncall_enabled == true ? [1] : []
    content {
      escalation_delay_in_minutes = 5
      target {
        type = "schedule_reference"
        id   = pagerduty_schedule.team_oncall_schedule_secondary.id
      }
    }
  }

  rule {
    escalation_delay_in_minutes = 60
    target {
      type = "schedule_reference"
      id   = var.oncall_schedule_id
    }
  }

  rule {
    escalation_delay_in_minutes = 15
    target {
      type = "schedule_reference"
      id   = var.manager_schedule_id
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}
Terraform will perform the following actions:

  # pagerduty_escalation_policy.team_oncall_escalation_policy will be updated in-place
  ~ resource "pagerduty_escalation_policy" "team_oncall_escalation_policy" {
        id          = "PJ81HJ6"
        name        = "<omitted> Escalation Policy"
        # (3 unchanged attributes hidden)

      ~ rule {
            id                          = "<omitted>"
            # (1 unchanged attribute hidden)

          ~ target {
              ~ id   = "PAIH89Y" -> "PZ4PJ9I"
                # (1 unchanged attribute hidden)
            }

            # (1 unchanged block hidden)
        }

        # (3 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Debug Output

2024-02-15T13:06:51.880+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: 2024/02/15 13:06:51 ===== PagerDuty Cache Skipping Init =====
2024-02-15T13:06:52.252+0100 [INFO]  provider.terraform-provider-pagerduty_v3.7.1: [INFO] PagerDuty client configured
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_v is [map[type:assign_to_everyone]]
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_teras is "assign_to_everyone"
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_v is [map[type:assign_to_everyone]]
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_teras is "assign_to_everyone"
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_v is [map[type:assign_to_everyone]]
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_teras is "assign_to_everyone"
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_v is [map[type:assign_to_everyone]]
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: expandEscalationRuleAssignmentStrategy_teras is "assign_to_everyone"
2024-02-15T13:06:52.252+0100 [INFO]  provider.terraform-provider-pagerduty_v3.7.1: [INFO] Updating PagerDuty escalation policy: <omitted>
2024-02-15T13:06:52.252+0100 [DEBUG] provider.terraform-provider-pagerduty_v3.7.1: [DEBUG] Waiting for state to become: [success]
2024-02-15T13:06:52.502+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 500ms before next try
2024-02-15T13:06:53.277+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 1s before next try
2024-02-15T13:06:54.539+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 2s before next try
2024-02-15T13:06:56.780+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 4s before next try
2024-02-15T13:07:01.037+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 8s before next try
pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 10s elapsed]
2024-02-15T13:07:09.305+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 10s before next try
pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 20s elapsed]
2024-02-15T13:07:19.569+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 10s before next try
pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 30s elapsed]
2024-02-15T13:07:29.813+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 10s before next try
pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 40s elapsed]
2024-02-15T13:07:40.065+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 10s before next try
pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 50s elapsed]
<omitted for brevity>
2024-02-15T13:11:35.749+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 10s before next try
pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 4m50s elapsed]
2024-02-15T13:11:45.990+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: [TRACE] Waiting 10s before next try
.pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=PJ81HJ6, 5m0s elapsed]
2024-02-15T13:11:52.255+0100 [WARN]  provider.terraform-provider-pagerduty_v3.7.1: [WARN] WaitForState timeout after 5m0s
2024-02-15T13:11:52.255+0100 [WARN]  provider.terraform-provider-pagerduty_v3.7.1: [WARN] WaitForState starting 30s refresh grace period
2024-02-15T13:11:54.257+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: Called downstream: @module=sdk.helper_schema tf_mux_provider="*schema.GRPCProviderServer" tf_provider_addr=registry.terraform.io/pagerduty/pagerduty @caller=github.com/hashicorp/terraform-plugin-sdk/v2@v2.31.0/helper/schema/resource.go:920 tf_req_id=bf5f7fd9-6e12-dd97-4231-b2f04f3b62f8 tf_resource_type=pagerduty_escalation_policy tf_rpc=ApplyResourceChange timestamp="2024-02-15T13:11:54.256+0100"
2024-02-15T13:11:54.267+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: Received downstream response: tf_proto_version=5.4 tf_req_duration_ms=302378 tf_req_id=bf5f7fd9-6e12-dd97-4231-b2f04f3b62f8 tf_rpc=ApplyResourceChange diagnostic_error_count=1 diagnostic_warning_count=0 tf_provider_addr=registry.terraform.io/pagerduty/pagerduty @caller=github.com/hashicorp/terraform-plugin-go@v0.20.0/tfprotov5/internal/tf5serverlogging/downstream_request.go:40 @module=sdk.proto tf_resource_type=pagerduty_escalation_policy timestamp="2024-02-15T13:11:54.258+0100"
2024-02-15T13:11:54.267+0100 [ERROR] provider.terraform-provider-pagerduty_v3.7.1: Response contains error diagnostic: @module=sdk.proto diagnostic_summary="PUT API call to https://api.pagerduty.com/escalation_policies/PJ81HJ6 failed: 403 Forbidden" tf_provider_addr=registry.terraform.io/pagerduty/pagerduty tf_req_id=bf5f7fd9-6e12-dd97-4231-b2f04f3b62f8 tf_resource_type=pagerduty_escalation_policy @caller=github.com/hashicorp/terraform-plugin-go@v0.20.0/tfprotov5/internal/diag/diagnostics.go:62 diagnostic_detail="" diagnostic_severity=ERROR tf_proto_version=5.4 tf_rpc=ApplyResourceChange timestamp="2024-02-15T13:11:54.258+0100"
2024-02-15T13:11:54.269+0100 [TRACE] provider.terraform-provider-pagerduty_v3.7.1: Served request: tf_req_id=bf5f7fd9-6e12-dd97-4231-b2f04f3b62f8 tf_rpc=ApplyResourceChange tf_proto_version=5.4 tf_provider_addr=registry.terraform.io/pagerduty/pagerduty tf_resource_type=pagerduty_escalation_policy @caller=github.com/hashicorp/terraform-plugin-go@v0.20.0/tfprotov5/t

Expected Behavior

I expected the escalation policy to update the target schedule.

Actual Behavior

The provider is timed out after 5m with 403 Forbidden error:

│ Error: PUT API call to https://api.pagerduty.com/escalation_policies/<omitted> failed: 403 Forbidden
│
│   with pagerduty_escalation_policy.team_oncall_escalation_policy,
│   on ../../../modules/pagerduty/pd-on-call-team/escalation_policy.tf line 1, in resource "pagerduty_escalation_policy" "team_oncall_escalation_policy":
│    1: resource "pagerduty_escalation_policy" "team_oncall_escalation_policy" {

Steps to Reproduce

Just terraform apply.

Important Factoids

We tried:

  1. Different PagerDuty provider versions -- :x:
  2. Different Terraform versions -- :x:
  3. Hardcoding PagerDuty API token in the provider authentication block.
  4. New PagerDuty API token with same permissions -- :x:
  5. Increasing provider logs verbosity, this times out after 5 minutes and emits same error as above w/o specific details -- :x:
  6. Updating escalation policy with original PagerDuty token using Python client -- :white_check_mark:

Re. (6), example snippet:

#!/usr/bin/env python

import os
from pdpyras import APISession
from pprint import pprint as pp

def get_api_token():
    """Retrieve API token from environment variables."""
    token = os.getenv("PAGERDUTY_API_TOKEN")
    if not token:
        raise ValueError("Please set the PAGERDUTY_API_TOKEN environment variable.")
    return token

def update_escalation_policy_target(session, policy_id, target_id, new_target_details):
    """Update the target in the escalation policy with new details."""
    policy = session.rget(f"/escalation_policies/{policy_id}")
    pp(policy)

    policy_updated = False
    del policy["teams"]  # remove teams as we don't use them and have not support plan

    for rule in policy.get("escalation_rules", []):
        for target in rule.get("targets", []):
            if target["id"] == target_id:
                target.update(new_target_details)
                policy_updated = True
                break 

    return policy, policy_updated

def main():
    api_token = get_api_token()
    session = APISession(api_token)

    escalation_policy_id = "PJ81HJ6"
    target_id_to_update = "PZ4PJ9I"

    new_target_details = {
        "id": "PAIH89Y",
        "self": "https://api.pagerduty.com/schedules/PAIH89Y",
        "summary": "<omitted>",
        "html_url": "https://<companyname>.pagerduty.com/schedules/PAIH89Y",
    }

    try:
        updated_policy, updated = update_escalation_policy_target(
            session, escalation_policy_id, target_id_to_update, new_target_details
        )

        if updated:
            session.rput(
                f"/escalation_policies/{escalation_policy_id}", json=updated_policy
            )
            print(f"Updated escalation policy {escalation_policy_id} successfully.")
        else:
            print(
                f"No matching rule found for update in policy {escalation_policy_id}."
            )

    except Exception as e:
        error_message = str(e)
        print(f"Error updating escalation policy: {error_message}")

if __name__ == "__main__":
    main()
oponomarov-tu commented 7 months ago

I did some more digging and reverting PagerDuty provider to version ~> 2.16.0 (2.16.2) did the trick, it all works again.

Apparently, version 3.7.1 could create a brand new escalation policy, but subsequently failed to update any attribute on it (as simple as description). Looking at the trace logs and headers, I've noticed that the PUT operation was missing some attributes in the content, it looks like:

{
 "escalation_policy": {
  "description": "Managed by Terraform",
  "escalation_rules": [
      ...
  ],
  "name": "xxx Escalation Policy",
  "num_loops": 9,
  "teams": null
 }
}

Comparing it to the PagerDuty API docs, it probably should have "type": "escalation_policy" which we're missing (this type is there though when creating the escalation policy initially with Terraform, which most likely is why it works).

elliot-graebert-skydio commented 7 months ago

I ran into this as well, exactly as described above.

vorotech commented 6 months ago

hey guys, encountered the same issue and found the root cause In my case we are on Profesional subscription (also same behaviour for Free)

It can be tested easily with PagerDuty API https://developer.pagerduty.com/api-reference/f9b1e15e70a0c-update-an-escalation-policy

Steps to reproduce and verify the root cause

  1. Enable the debug output with TF_LOG=debug
  2. Run the terraform apply
  3. Grab the PUT request to /escalation_policies/<SOMEID> with body <JSON_PAYLOAD>
  4. Paste to PagerDuty API page and make respoonse
  5. Returns 403 Forbidden
  6. Remove the "escalation_rule_assignment_strategy":{"type":"assign_to_everyone"} (which is added as a default value while terraform refreshes the state)
  7. Make response again and you get 200 OK

NOTE for Devs While initial creation of the resource, the escalation_rule_assignment_strategy is not sent to API endpoint since it's not specified in the terraform configuration. But during the update operation, terraform syncs from the remote state grabbing the "default" value of escalation_rule_assignment_strategy which is causing the issue. The strategy accepts two values, and my guess that server validation only check the presense of the property, not its value. Either check should be updated on backend to allow the "default" value implicitly set by terraform while refresh or fix in terraform provider to avoid syncing this field if it omitted in the resource configuration in .tf file.

rawmind0 commented 6 months ago

The root cause of this issue, seems to be related with the pagerduty plan that is used. The configuration for escalation_rule_assignment_strategy seems to be just allowed for Business and Digital Operations plans, https://support.pagerduty.com/docs/round-robin-scheduling , so other user levels shouldn't be able to set this.

The tf provider is computing the escalation_rule_assignment_strategy resource field without taking into account this limitation, so if the pagerduty user doesn't have permissions to se it, the pagerduty_escalation_policy can't be updated with 403 Forbidden.

The error may be fixed here, https://github.com/PagerDuty/terraform-provider-pagerduty/blob/v3.9.0/pagerduty/resource_pagerduty_escalation_policy.go#L173 . The pagerduty.GetEscalationPolicyOptions should just include the escalation_rule_assignment_strategies if the user is allowed to set it (due to field is gonna be added to the PUT request if the read api response is including it). I guess a specific ability should exist for this, so checking it using the client https://github.com/heimweh/go-pagerduty/blob/master/pagerduty/ability.go#L15, before define the pagerduty.GetEscalationPolicyOptions should fix the issue.

I forget to mention, the last version that should work fine on this is the v3.2.2

doctornkz-intelas commented 6 months ago

@rawmind0 Mate, thank you for the version suggestion, at least I can continue my imports! It works.

vorotech commented 5 months ago

@imjaroiswebdev this wasn't fixed unfortunatelly.

imjaroiswebdev commented 5 months ago

Hey folks! I'm working on an improvement for this, which will be released shortly.

imjaroiswebdev commented 5 months ago

Please upgrade to PagerDuty Terraform provider v3.11.2 or newer to stop facing this issue. Thanks for your patience and feedback.

oponomarov-tu commented 5 months ago

I can confirm v3.11.2 resolved the issue. Thanks! ❤️

@imjaroiswebdev, actually, looks like it is not resolved. Still failing to modify the resource in-place:

module.pagerduty.pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=<redacted>, 20s elapsed]
module.pagerduty.pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=<redacted>, 30s elapsed]
module.pagerduty.pagerduty_escalation_policy.team_oncall_escalation_policy: Still modifying... [id=<redacted>, 40s elapsed]
...

.terraform.lock.hcl:

provider "registry.terraform.io/pagerduty/pagerduty" {
  version     = "3.11.2"
...
langenoja commented 5 months ago

Same here, it looks like the provider is working longer, but eventually it fails. SECURE logging doesn't provide any other valuable insight other than the 403 Forbidden message.

fraenky8 commented 5 months ago

Bumping this one, same here. 🙏

imjaroiswebdev commented 5 months ago

A new patch for handling malformed 403 errors, which are the culprit in this case, is about to be released shortly. Please stay tuned.

imjaroiswebdev commented 5 months ago

Please one more time, upgrade to PagerDuty TF provider v3.11.4 or newer, this should solve the issue.

fraenky8 commented 5 months ago

Confirmed, it worked on our side! Thanks a lot! 🙇‍♂️

langenoja commented 5 months ago

Confirmed to be working here as well, thank you!

imjaroiswebdev commented 5 months ago

Awesome! Thanks to you all for the feedback and your patience 👏🏽 🎉