DataDog / terraform-provider-datadog

Terraform Datadog provider
https://www.terraform.io/docs/providers/datadog/
Mozilla Public License 2.0
403 stars 378 forks source link

Default no_data_timeframe to 2x timeframe #270

Open xp-1000 opened 5 years ago

xp-1000 commented 5 years ago

Hi

Terraform Version

Terraform v0.12.3

Affected Resource(s)

Terraform Configuration Files

resource "datadog_monitor" "foo" {
  name               = "Name for monitor foo"
  type               = "metric alert"
  message            = "Monitor triggered. Notify: @hipchat-channel"
  escalation_message = "Escalation message @pagerduty"

  query = "avg(last_1h):avg:aws.ec2.cpu{environment:foo,host:foo} by {host} > 4"

  thresholds = {
    critical          = 4
  }

  notify_no_data    = true
  #no_data_timeframe = 120 # 2x alerting window (last_1h)
}

Expected Behavior

What should have happened? Ideally, the provider should respect the documentation and default no_data_timeframe to 2x the alerting window : https://docs.datadoghq.com/monitors/monitor_types/metric/#configuration Set a hardcoded value as 10min could be risky.

Actual Behavior

What actually happened? Since provider v2.0 : https://github.com/terraform-providers/terraform-provider-datadog/blob/master/CHANGELOG.md#200-june-18-2019

datadog_monitor: Add default to no data timeframe of 10 minutes. (#212) And before v2.0 it was 0 mn which was worst. Current behavior is better but will not work in all cases (when monitor alerting window >5min)

Important Factoids

Notice that API documentation suggests the datadog API server will define the default value to 2x alerting window but this is false : https://docs.datadoghq.com/api/?lang=python#create-a-monitor

Default: 2x timeframe for metric alerts, 2 minutes for service checks

References

bkabrda commented 4 years ago

So I've been looking into this and it's almost impossible achieve with current TF internal functionality. Let me explain:

Based on the above, I'm leaning towards thinking that this is just not possible ATM, but I'll try to think about this some more and come up with a different solution, so leaving this open for now.

[1] https://godoc.org/github.com/hashicorp/terraform/helper/schema#SchemaDefaultFunc [2] https://godoc.org/github.com/hashicorp/terraform/helper/schema#SchemaDiffSuppressFunc

phillip-dd commented 4 years ago

@bkabrda I believe in buildMonitorStruct when we set NoDataTimeframe: https://github.com/terraform-providers/terraform-provider-datadog/blob/36fba07f4a57b264851f7f5206bf095f22335b88/datadog/resource_datadog_monitor.go#L259-L261

we could add something like:

else if o.NotifyNoData {
  // rough check to get "2x the timeframe"
  noData := "last_10m"
  query := d.Get("query").(string)
  if strings.Count(query, "last_5m") > 0 {
    noData = "last_10m"
  } else if strings.Count(query, "last_10m") > 0 {
   noData = "last_30m"
  } else if strings.Count(query, "last_15m") > 0 {
   noData = "last_30m"
  } else if strings.Count(query, "last_30m") > 0 {
   noData = "last_1h"
  } else if strings.Count(query, "last_1h") > 0 {
   noData = "last_2h"
  } else if strings.Count(query, "last_2h") > 0 {
   noData = "last_4h"
  } else if strings.Count(query, "last_4h") > 0 {
   noData = "last_12h"
  } else if strings.Count(query, "last_1d") > 0 {
   noData = "1h_1d_ago"
  }

  o.NoDataTimeframe = noData
}

Which will set NoDataTimeframe to a value thats roughly 2x the timeframe unless specified by the user.

bkabrda commented 4 years ago

@phillip-dd the problem here is that if we do that, the user will always have a non-empty plan on running terraform apply, because the TF config will contain no value for no_data_timeframe, but the API will return a value for it (that's what I tried to explain in the third bullet point in my previous comment).

alsyia commented 3 years ago

I'm not sure why or how, but when we use no_data_timeframe = 0 in our Terraform code, our monitors apparently use the default behaviour, that is to switch to the No Data state if no data has been received for two times the evaluation window. Terraform plan and apply work as expected since the value is hardcoded.

If that helps...