dynatrace-oss / terraform-provider-dynatrace

Apache License 2.0
71 stars 33 forks source link

Scoping group of entities for worload anomaly detection in kubernetes #415

Closed karty-s closed 7 months ago

karty-s commented 8 months ago

Hi Team, we are using the native anomaly detection feature in DT . we are facing constraint when applying it to specific set of entities. the current scopes that can be applied to workload are namespace and cluster - https://registry.terraform.io/providers/dynatrace-oss/dynatrace/latest/docs/resources/k8s_workload_anomalies#scope but in our case we want to apply some of the native anomaly detection features like Not all pods ready, pending pods, etc to a group of clusters or namespaces or workloads. Since we manage more than 150+ clusters and 50+ workloads in each of them, its not feasible to add/enable this alert manually, Kindly help us on this requirement.

Describe the solution you'd like we would like to have a larger scope of entities like management zone or clusters/workloads having certain tags.

Describe alternatives you've considered only alternative we have now is to add all the 50+ workloads manually which is not feasible.

Dynatrace-Reinhard-Pilz commented 8 months ago

Hello @karty-s,

I'd like to make sure I'm not misunderstanding your request. Which of the two situations applies? a) You're able to configure Anomaly Detection based on Management Zones/Clusters/Workloads via WebUI - but Terraform isn't able to replicate these settings easily? b) Neither the WebUI nor Terraform currently allow for configuration of Anomaly Detection based on your description - and you're hoping that the Terraform Provider can allow for a bit more convenience here?

If it is a) then I'd be grateful for a quick description for the steps you're taking via WebUI. In that case we would have a gap I'd like to take care of.

If it is b) then I believe I have good news for you - everything you need for that is already available. Let's take a look at this example:

data "dynatrace_entities" "clusters" {
  entity_selector = "type(KUBERNETES_CLUSTER),tag(foo)"
}

locals {
  cluster_ids = {
    for cluster in data.dynatrace_entities.clusters.entities :
    cluster.entity_id => cluster
  }
}

resource "dynatrace_k8s_cluster_anomalies" "anomalies-by-cluster" {
  for_each = local.cluster_ids

  scope = each.key
  cpu_requests_saturation {
    enabled = true
    configuration {
      observation_period_in_minutes = 20
      sample_period_in_minutes      = 15
      threshold                     = 95
    }
  }
  memory_requests_saturation {
    enabled = true
    configuration {
      observation_period_in_minutes = 20
      sample_period_in_minutes      = 15
      threshold                     = 95
    }
  }
  monitoring_issues {
    enabled = true
    configuration {
      observation_period_in_minutes = 35
      sample_period_in_minutes      = 20
    }
  }
  pods_saturation {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 95
    }
  }
  readiness_issues {
    enabled = true
    configuration {
      observation_period_in_minutes = 5
      sample_period_in_minutes      = 4
    }
  }
}
resource "dynatrace_k8s_cluster_anomalies" "anomalies-by-cluster" {
  for_each = local.cluster_ids

  scope = each.key
  ...
}

Let me know if that helps. Again, perhaps I'm misunderstanding your requirement.

Dynatrace-Reinhard-Pilz commented 8 months ago

@karty-s I'm closing this ticket, assuming the example I've provided reflected what you were looking for. If there are further questions, just drop a comment and we will re-open this ticket.

karty-s commented 8 months ago

Hi Reinhard, i agree the solution is effective for the dynatrace_k8s_cluster_anomalies resource, but our requirement is to apply/select all our workloads in our MZ for dynatrace_k8s_workload_anomalies resource module - https://registry.terraform.io/providers/dynatrace-oss/dynatrace/latest/docs/resources/k8s_workload_anomalies#optional . it is currently mentioned in the doc that :-

scope (String) The scope of this setting (CLOUD_APPLICATION_NAMESPACE, KUBERNETES_CLUSTER). Omit this property if you want to cover the whole environment. please let me know for this use case. Kindly repopen this request and sorry for the delay in response.

Dynatrace-Reinhard-Pilz commented 8 months ago

Dynatrace doesn't support defining Workload Anomalies for a specific Kubernetes Workload. But you can define Workload Anomalies for a specific Namespace - which essentially covers all the Workloads within that Namespace. Even if you navigate via WebUI to a specific Workload and change the Anomaly Detection Rules, you're automatically changing them for all the Workloads within that same Namespace.

I'm afraid, that's a limitation also the Terraform Provider won't be able to work around.

But if that's fine granular for you already, then this example should cover your use case.

data "dynatrace_entities" "kube-system-namespaces" {
  entity_selector = "type(CLOUD_APPLICATION_NAMESPACE),mzName(kube)"
}

locals {
  namespace_ids = {
    for namespace in data.dynatrace_entities.kube-system-namespaces.entities :
    namespace.entity_id => namespace
  }
}

resource "dynatrace_k8s_workload_anomalies" "workload-anomalies-for-kube-namespaces" {
  for_each = local.namespace_ids

  scope = each.key

  container_restarts {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  deployment_stuck {
    enabled = true
    configuration {
      observation_period_in_minutes = 5
      sample_period_in_minutes      = 4
    }
  }
  not_all_pods_ready {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
    }
  }
  pending_pods {
    enabled = true
    configuration {
      observation_period_in_minutes = 16
      sample_period_in_minutes      = 11
      threshold                     = 2
    }
  }
  pod_stuck_in_terminating {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
    }
  }
  workload_without_ready_pods {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
    }
  }
  high_cpu_throttling {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  high_cpu_usage {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  high_memory_usage {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  job_failure_events {
    enabled = true
  }
  oom_kills {
    enabled = true
  }
  pod_backoff_events {
    enabled = true
  }
  pod_eviction_events {
    enabled = true
  }
  pod_preemption_events {
    enabled = true
  }
}

Like in the example for K8s Clusters we're using the dynatrace_entities data source here - but instead for clusters we're searching for Namespaces. The entity_selector in addition filters just for Namespaces that belong to the Management Zone kube.

data "dynatrace_entities" "kube-system-namespaces" {
  entity_selector = "type(CLOUD_APPLICATION_NAMESPACE),mzName(kube)"
}

And like in the earlier example we're required to produce some sort of map out of the results of dynatrace_entities - otherwise we wouldn't be able to use them later on within the for_each meta argument.

locals {
  namespace_ids = {
    for namespace in data.dynatrace_entities.kube-system-namespaces.entities :
    namespace.entity_id => namespace
  }
}

The resource block dynatrace_k8s_workload_anomalies in combination with for_each finally creates a resource instance with the given settings for every Namespace that matched when you queried for them using dynatrace_entities.

resource "dynatrace_k8s_workload_anomalies" "workload-anomalies-for-kube-namespaces" {
  for_each = local.namespace_ids

  scope = each.key
  ...
}

Let me know if that helps, Reinhard

Dynatrace-Reinhard-Pilz commented 7 months ago

Closing ticket. Again, just drop us a message if your use case isn't what I was talking about.