Open noahpb opened 3 weeks ago
Internal related issue: https://github.com/defenseunicorns/uds-infrastructure/issues/573
Thanks to @rjferguson21's suggestion, we've been able to confirm that the allow-prometheus-stack-egress-metrics-scraping
NetworkPolicy generated by the operator needs to be adjusted. The remoteNamespace: ""
specification is not permissive enough to allow egress traffic to the prometheus-node-exporter daemonset pods. Manually adjusting the egress specification of the NetworkPolicy to the CIDR range of the nodes worked in my local testing.
Would suggest to resolve this we build an AllNodes
generated target. We should be able to build that list of IPs using a watch on the nodes with Pepr, similar to our KubeAPI
target. This would also be helpful for metrics-server which has an Anywhere
rule with a todo comment to switch that to an all nodes target.
Code links for current kubeapi logic:
Once this is added as a generated target we can add it to Prometheus and make sure that the traffic works as expected.
Environment
Device and OS: darwin arm64 App version: v0.29.1-unicorn Kubernetes distro being used: k3d with two nodes
Steps to reproduce
Expected result
Container metrics such as CPU and Memory utilization should be queryable
Actual Result
Prometheus only returns metrics from pods that are scheduled on control plane nodes
Visual Proof (screenshots, videos, text, etc)
Metrics returned for
container_cpu_usage_seconds
No metrics returned when filtering out control plane node:
Severity/Priority
Moderate
Additional Context
Removing all
NetworkPolicies
in themonitoring
namespace allows Prometheus to pick up metrics from the missing nodes.