Scoping group of entities for worload anomaly detection in kubernetes

Hi Team, we are using the native anomaly detection feature in DT . we are facing constraint when applying it to specific set of entities. the current scopes that can be applied to workload are namespace and cluster - https://registry.terraform.io/providers/dynatrace-oss/dynatrace/latest/docs/resources/k8s_workload_anomalies#scope but in our case we want to apply some of the native anomaly detection features like Not all pods ready, pending pods, etc to a group of clusters or namespaces or workloads. Since we manage more than 150+ clusters and 50+ workloads in each of them, its not feasible to add/enable this alert manually, Kindly help us on this requirement.

Describe the solution you'd like we would like to have a larger scope of entities like management zone or clusters/workloads having certain tags.

Describe alternatives you've considered only alternative we have now is to add all the 50+ workloads manually which is not feasible.

Hello @karty-s,

I'd like to make sure I'm not misunderstanding your request. Which of the two situations applies? a) You're able to configure Anomaly Detection based on Management Zones/Clusters/Workloads via WebUI - but Terraform isn't able to replicate these settings easily? b) Neither the WebUI nor Terraform currently allow for configuration of Anomaly Detection based on your description - and you're hoping that the Terraform Provider can allow for a bit more convenience here?

If it is a) then I'd be grateful for a quick description for the steps you're taking via WebUI. In that case we would have a gap I'd like to take care of.

If it is b) then I believe I have good news for you - everything you need for that is already available. Let's take a look at this example:

data "dynatrace_entities" "clusters" {
  entity_selector = "type(KUBERNETES_CLUSTER),tag(foo)"
}

locals {
  cluster_ids = {
    for cluster in data.dynatrace_entities.clusters.entities :
    cluster.entity_id => cluster
  }
}

resource "dynatrace_k8s_cluster_anomalies" "anomalies-by-cluster" {
  for_each = local.cluster_ids

  scope = each.key
  cpu_requests_saturation {
    enabled = true
    configuration {
      observation_period_in_minutes = 20
      sample_period_in_minutes      = 15
      threshold                     = 95
    }
  }
  memory_requests_saturation {
    enabled = true
    configuration {
      observation_period_in_minutes = 20
      sample_period_in_minutes      = 15
      threshold                     = 95
    }
  }
  monitoring_issues {
    enabled = true
    configuration {
      observation_period_in_minutes = 35
      sample_period_in_minutes      = 20
    }
  }
  pods_saturation {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 95
    }
  }
  readiness_issues {
    enabled = true
    configuration {
      observation_period_in_minutes = 5
      sample_period_in_minutes      = 4
    }
  }
}

The Data Source dynatrace_entities allows you to query for all kinds of entities - but since you're looking for K8s Clusters the entity_selector needs to contain type(KUBERNETES_CLUSTER). You could also query for K8s Namespaces when using type(CLOUD_APPLICATION_NAMESPACE)
Because you'd like to restrict things down to K8s Clusters that have a certain tag your entity_selector also needs to contain tag(<tagname>). You could also restrict the results down by Management Zones using mzName("<mgmz-name") or mzId(<mgmz-id>). For the full feature set of Entity Selectors just take a look into the Dynatrace Documentation. Entity Selectors are pretty powerful - Tags and Management Zone are just scratching the surface.
```
data "dynatrace_entities" "clusters" {
entity_selector = "type(KUBERNETES_CLUSTER),tag(foo)"
}
```
The data source produces an object like this:
```
entities        = [
{
  display_name  = "dtcookie"
  entity_id     = "KUBERNETES_CLUSTER-XXXXXA2A8F4B4DCF"
  properties    = {
    kubernetesApiMonitoringState = "ENABLED"
    kubernetesClusterId          = "XXX60240-XXXX-XXXX-b788-7a5cf5ba7462"
    kubernetesDistribution       = "KUBERNETES"
    kubernetesVersion            = "v1.25.16-gke.1360000"
  }
}
]
```
That's not really suitable when later on using for_each. That's the reason for the locals block. Essentially that block produces a locally available map with the ID of the clusters matching the entity_selector specified above as the map keys.
```
locals {
cluster_ids = {
for cluster in data.dynatrace_entities.clusters.entities :
cluster.entity_id => cluster
}
}
```
Finally, you're using the for_each Meta Argument within your dynatrace_k8s_cluster_anomalies resource block. As a result Terraform automatically creates multiple resource instances for that single block - and scope = each.key ensures that every resource instance contains as the scope a different K8s Cluster ID.

resource "dynatrace_k8s_cluster_anomalies" "anomalies-by-cluster" {
  for_each = local.cluster_ids

  scope = each.key
  ...
}

Let me know if that helps. Again, perhaps I'm misunderstanding your requirement.

@karty-s I'm closing this ticket, assuming the example I've provided reflected what you were looking for. If there are further questions, just drop a comment and we will re-open this ticket.

Hi Reinhard, i agree the solution is effective for the dynatrace_k8s_cluster_anomalies resource, but our requirement is to apply/select all our workloads in our MZ for dynatrace_k8s_workload_anomalies resource module - https://registry.terraform.io/providers/dynatrace-oss/dynatrace/latest/docs/resources/k8s_workload_anomalies#optional . it is currently mentioned in the doc that :-

scope (String) The scope of this setting (CLOUD_APPLICATION_NAMESPACE, KUBERNETES_CLUSTER). Omit this property if you want to cover the whole environment. please let me know for this use case. Kindly repopen this request and sorry for the delay in response.

Dynatrace doesn't support defining Workload Anomalies for a specific Kubernetes Workload. But you can define Workload Anomalies for a specific Namespace - which essentially covers all the Workloads within that Namespace. Even if you navigate via WebUI to a specific Workload and change the Anomaly Detection Rules, you're automatically changing them for all the Workloads within that same Namespace.

I'm afraid, that's a limitation also the Terraform Provider won't be able to work around.

But if that's fine granular for you already, then this example should cover your use case.

data "dynatrace_entities" "kube-system-namespaces" {
  entity_selector = "type(CLOUD_APPLICATION_NAMESPACE),mzName(kube)"
}

locals {
  namespace_ids = {
    for namespace in data.dynatrace_entities.kube-system-namespaces.entities :
    namespace.entity_id => namespace
  }
}

resource "dynatrace_k8s_workload_anomalies" "workload-anomalies-for-kube-namespaces" {
  for_each = local.namespace_ids

  scope = each.key

  container_restarts {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  deployment_stuck {
    enabled = true
    configuration {
      observation_period_in_minutes = 5
      sample_period_in_minutes      = 4
    }
  }
  not_all_pods_ready {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
    }
  }
  pending_pods {
    enabled = true
    configuration {
      observation_period_in_minutes = 16
      sample_period_in_minutes      = 11
      threshold                     = 2
    }
  }
  pod_stuck_in_terminating {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
    }
  }
  workload_without_ready_pods {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
    }
  }
  high_cpu_throttling {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  high_cpu_usage {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  high_memory_usage {
    enabled = true
    configuration {
      observation_period_in_minutes = 6
      sample_period_in_minutes      = 4
      threshold                     = 2
    }
  }
  job_failure_events {
    enabled = true
  }
  oom_kills {
    enabled = true
  }
  pod_backoff_events {
    enabled = true
  }
  pod_eviction_events {
    enabled = true
  }
  pod_preemption_events {
    enabled = true
  }
}

Like in the example for K8s Clusters we're using the dynatrace_entities data source here - but instead for clusters we're searching for Namespaces. The entity_selector in addition filters just for Namespaces that belong to the Management Zone kube.

data "dynatrace_entities" "kube-system-namespaces" {
  entity_selector = "type(CLOUD_APPLICATION_NAMESPACE),mzName(kube)"
}

And like in the earlier example we're required to produce some sort of map out of the results of dynatrace_entities - otherwise we wouldn't be able to use them later on within the for_each meta argument.

locals {
  namespace_ids = {
    for namespace in data.dynatrace_entities.kube-system-namespaces.entities :
    namespace.entity_id => namespace
  }
}

The resource block dynatrace_k8s_workload_anomalies in combination with for_each finally creates a resource instance with the given settings for every Namespace that matched when you queried for them using dynatrace_entities.

resource "dynatrace_k8s_workload_anomalies" "workload-anomalies-for-kube-namespaces" {
  for_each = local.namespace_ids

  scope = each.key
  ...
}

Let me know if that helps, Reinhard

Closing ticket. Again, just drop us a message if your use case isn't what I was talking about.

dynatrace-oss / terraform-provider-dynatrace

Scoping group of entities for worload anomaly detection in kubernetes #415