claranet / terraform-signalfx-detectors

Collection of terraform modules for SignalFx detectors.
Mozilla Public License 2.0
23 stars 32 forks source link

[BUG] Detector Pod phase status do not correctily not filter pod linked jobs #556

Open antonin-rouault opened 6 months ago

antonin-rouault commented 6 months ago

What is the module? otel-collector_kubernetes-common

What is the detector? pod_phase_status https://github.com/claranet/terraform-signalfx-detectors/blob/c94f90c18ab1cfbeef1efb4736ca830be20399b0/modules/otel-collector_kubernetes-common/detectors-gen.tf#L99C31-L99C47

Describe the bug when one or more pods linked to a job fail, theoretically, the detector should not trigger any alert. But in the end we have many alerts triggered on pods linked to jobs.

To Reproduce Steps to reproduce the behavior:

  1. start a job doomed to fail
  2. observe the monitoring in signalfx
  3. Moment / situation when detector falsely raise (or not) alert

Expected behavior A clear and concise description of what you expected to happen and difference compared to previous section. the detector should not raise any alert when a pod linked to a job fails

resolution replace the base-filtering filter (not filter('k8s.job.name', '*')) and (not filter('cronk8s.job.name', '*')) by (not filter('k8s.job.uid', '*')) and (not filter('cronk8s.job.uid', '*'))

https://github.com/claranet/terraform-signalfx-detectors/blob/c94f90c18ab1cfbeef1efb4736ca830be20399b0/modules/otel-collector_kubernetes-common/detectors-gen.tf#L107