Closed sastels closed 5 days ago
Last data on staging is at 15:09 EST on July 24. This manifest PR Paring down kube state metrics was merged at that same minute.
This PR adds an allow list for the k8s metrics
metricAllowlist:
- pods=[*]
- nodes=[*]
- deployments=[*]
Unfortunately the pod counts per deployment are not in there, thought the node count is. However the node count is coming from eks I think.
The metric we want is kube_deployment_status_replicas_available
. According to the docs it needs a namespace? Maybe we need to add
- namespaces=[notification-canada-ca]
to this list
It looks like maybe we're not getting any of the metrics we used to...
This PR trying to allow all namespaces didn't do anything :disappointed:
This is how the kube-state-metrics
pod starts. I see that there's an explicit "Using all namespaces" before the allowlist details, so it we likely don't have to mention namespaces.
I0801 17:43:14.752112 1 wrapper.go:120] "Starting kube-state-metrics" │
│ W0801 17:43:14.752422 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. │
│ I0801 17:43:14.752586 1 server.go:199] "Used resources" resources=["leases","poddisruptionbudgets","endpoints","ingresses","jobs","persistentvolumes","replicationcontrollers","storageclasses","persistentvolumeclaims","pods","services","configmaps","mutatingwebhookconfigurations","networkpolicies","secrets","validating │
│ I0801 17:43:14.752625 1 types.go:227] "Using all namespaces" │
│ I0801 17:43:14.752637 1 types.go:145] "Using node type is nil" │
│ I0801 17:43:14.752717 1 server.go:226] "Metric allow-denylisting" allowDenyStatus="Including the following lists that were on allowlist: pods=[*], nodes=[*], deployments=[*], namespaces=[*]" │
│ W0801 17:43:14.752747 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. │
│ I0801 17:43:14.753188 1 utils.go:70] "Tested communication with server" │
│ I0801 17:43:14.763046 1 utils.go:75] "Run with Kubernetes cluster version" major="1" minor="30+" gitVersion="v1.30.2-eks-db838b0" gitTreeState="clean" gitCommit="04088714581f0ad0a9e2c81c6ecc36bdd30d4b53" platform="linux/amd64" │
│ I0801 17:43:14.763065 1 utils.go:76] "Communication with server successful" │
│ I0801 17:43:14.764078 1 server.go:350] "Started metrics server" metricsServerAddress=":8080" │
│ I0801 17:43:14.764290 1 server.go:73] levelinfomsgListening onaddress:8080 │
│ I0801 17:43:14.764302 1 server.go:73] levelinfomsgTLS is disabled.http2falseaddress:8080 │
│ I0801 17:43:14.764340 1 metrics_handler.go:99] "Autosharding disabled" │
│ I0801 17:43:14.765003 1 server.go:339] "Started kube-state-metrics self metrics server" telemetryAddress=":8081" │
│ I0801 17:43:14.765060 1 server.go:73] levelinfomsgListening onaddress:8081 │
│ I0801 17:43:14.765072 1 server.go:73] levelinfomsgTLS is disabled.http2falseaddress:8081 │
│ I0801 17:43:14.766208 1 builder.go:282] "Active resources" activeStoreNames="certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes │
QA'ed! all good. moving to done
Describe the bug
The pod counts don't show up here anymore!
Looks like the graphs stopped having data on July 25.
Bug Severity
See examples in the documentation
SEV-3 Minor - no affect on system but we can't monitor how it's scaling
To Reproduce
Go to the Notify System Overview dashboard
See that there's no pod count data
Expected behavior
Data is there
Impact
Cannot easily monitor the system
Screenshots