PagerDuty / terraform-provider-pagerduty

Terraform PagerDuty provider
https://www.terraform.io/docs/providers/pagerduty/
Mozilla Public License 2.0
206 stars 210 forks source link

Schedule deletion does not work when incidents are still open #567

Open drastawi opened 2 years ago

drastawi commented 2 years ago

Terraform Version

2.6.1

Affected Resource(s)

"pagerduty_schedule"

Terraform Configuration Files

This would be after deleting any schedule linked to an escalation policy

resource "pagerduty_team" "foo" {
    name = "%s"
    description = "fighters"
}

resource "pagerduty_schedule" "foo" {
  name = "%s"

  time_zone   = "%s"
  description = "foo"

  teams = [pagerduty_team.foo.id]

  layer {
    name                         = "foo"
    start                        = "%s"
    rotation_virtual_start       = "%s"
    rotation_turn_length_seconds = 86400
    users                        = [pagerduty_user.foo.id]

    restriction {
      type              = "daily_restriction"
      start_time_of_day = "08:00:00"
      duration_seconds  = 32101
    }
  }
}

resource "pagerduty_escalation_policy" "foo" {
  name      = "%s"
  num_loops = 2
  teams     = [pagerduty_team.foo.id]

  rule {
    escalation_delay_in_minutes = 10
    target {
      type = "user_reference"
      id   = pagerduty_user.foo.id
    }
    target {
      type = "schedule_reference"
      id   = pagerduty_schedule.foo.id
    }
  }
}

Debug Output

Panic Output

"Schedule can't be deleted if it's being used by an escalation policy snapshot with open incidents"

Expected Behavior

Should close incidents and remove the schedule

Actual Behavior

Schedule is not removed

Steps to Reproduce

  1. Trigger an incident on a service that has an Escalation policy with a schedule in it. (the schedule can be in any layer of the escalation policy)
  2. Remove the schedule from the escalation policy
  3. Attempt to delete the schedule
  4. This is when you should get the error message "Schedule can't be deleted if it's being used by an escalation policy snapshot with open incidents"

    Important Factoids

    Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

References

Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:

bilbof commented 2 years ago

This is a blocker for configuring PagerDuty in code. It's a tricky one but I'd suggest that the incidents would be resolved automatically (i.e. the effect would cascade).

imjaroiswebdev commented 1 year ago

When a trying to delete a Schedule that is being used by an Escalation Policy with open incidents, but additionally that Schedule gets removed from Escalation Policy to be part of another EP or just to be deleted, the Schedule’s data loses the traceability with the EP with the open incidents, because that relation is tracked through the EP snapshot created when the incident gets triggered.

So, The error received from that deletion intend is the following:

[Schedule can't be deleted if it's being used by an escalation policy snapshot with open incidents]

Therefore, at the /schedules public API level, We would need the id(s) of the open incidents or at least the id(s) of the EP with the open incidents, to inform the TF Users through the error message which incidents need to be resolved or reassigned. Like We currently do with Schedules in this scenario with traceable incidents.

So, as long as an update to errors messages on this case for /schedules is not released, We won't be able to present a more helpful error.

This has been informed to PagerDuty /schedules API owner and they already have it in their roadmap, unfortunately We don't have an ETA yet.

bilbof commented 1 year ago

No worries, FWIW we ended up writing a custom pagerduty controller (like a k8s operator) that reconciles desired config with what is in pagerduty. it was a little tricky since the api has some gotchas like this - hit another one today: support hours need to be HH:MM:00, so 23:59:59 won't work 😄 no big deal though as we worked around them.