Open dndungu opened 5 years ago
Hi @dndungu
The kubelet check already collects several network metrics for each container. This is where one would add connection state metric collection.
Looking at the openmetrics payload from the kubelet, I'm seeing two gauges matching your description:
# HELP container_network_tcp_usage_total tcp connection usage statistic for container
# TYPE container_network_tcp_usage_total gauge
# HELP container_network_udp_usage_total udp connection usage statistic for container
# TYPE container_network_udp_usage_total gauge
Are these the ones you had in mind? If so, I'll be adding these to our roadmap.
Hi @xvello,
Yes, these metrics are what I have in mind. We want monitor container open network connections.
# HELP container_network_tcp_usage_total tcp connection usage statistic for container
# TYPE container_network_tcp_usage_total gauge
# HELP container_network_udp_usage_total udp connection usage statistic for container
# TYPE container_network_udp_usage_total gauge
Please update us when you have an estimate on when we can get this in the DD agent.
Thanks.
Hello,
As 6.11 is already in freeze, this has been prioritized for 6.12, due the week of May 20th.
Regards
Thanks @xvello
Hey @dndungu I just wanted to follow up on this.
The metrics you mentioned are from cadvisor and they are disabled by default:
https://github.com/google/cadvisor/blob/master/docs/runtime_options.md#metrics disable_metrics=tcp, udp
The kubelet embeds cadvisor and it can't be configured (including updating disable_metrics
).
As a result, the only solution is to run cadvisor as a daemonset and activate the options (see DS below as an example).
We thought the metrics would be easy to collect because up until cadvisor 0.31.0 (which was embedded in the kubelet until 1.12), disabled metrics would show up but as 0. As per: https://github.com/kubernetes/kubernetes/issues/60279. So we were seeing the metric names, but did not realise that the metrics were wrong.
Please find attached an example file that will run cadvisor as a daemonset configured with annotations so that the agents autodiscovers the pods and run a generic open metrics check to retrieve those metrics:
apiVersion: apps/v1 # for Kubernetes versions before 1.9.0 use apps/v1beta2
kind: DaemonSet
metadata:
name: cadvisor
spec:
selector:
matchLabels:
name: cadvisor
template:
metadata:
annotations:
ad.datadoghq.com/cadvisor.check_names: '["prometheus"]'
ad.datadoghq.com/cadvisor.init_configs: '[{}]'
ad.datadoghq.com/cadvisor.instances: '[{"prometheus_url": "http://%%host%%:8080/metrics",
"namespace": "cadvisor", "metrics": ["container_network_tcp_usage_total",
"container_network_udp_usage_total"]}]'
labels:
name: cadvisor
spec:
containers:
- name: cadvisor
args:
- --housekeeping_interval=10s # kubernetes default args
- --max_housekeeping_interval=15s
- --event_storage_event_limit=default=0
- --event_storage_age_limit=default=0
- --disable_metrics=percpu # enable only diskIO, cpu, memory, network, disk,tcp, udp, process
- --docker_only
image: k8s.gcr.io/cadvisor:v0.33.0
resources:
requests:
memory: 200Mi
cpu: 150m
limits:
cpu: 300m
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: var-run
mountPath: /var/run
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: docker
mountPath: /var/lib/docker
readOnly: true
- name: disk
mountPath: /dev/disk
readOnly: true
ports:
- name: http
containerPort: 8080
hostPort: 8080
protocol: TCP
terminationGracePeriodSeconds: 30
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/docker
- name: disk
hostPath:
path: /dev/disk
prometheus (3.2.0)
------------------
Instance ID: prometheus:cadvisor:42ceda833367bd8d [OK]
Total Runs: 31
Metric Samples: Last Run: 180, Total: 5,580
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 31
Average Execution Time : 1.008s
Disclaimer: Those will count as custom metrics, also some labels appear empty and might end up generating tags that are not really usable. I did not spend too much time digging into that.
Finally, this is just an exemple to configure cadvisor, more details can be found in their official doc here
As the kubernetes community wants to remove cadvisor from the kubelet eventually we are going to suggest adding those metrics directly in the kubelet as well.
Best, .C
We (on the same team as @dndungu who's out on PTO this week) are running cadvisor as a daemonset already, but we don't have those annotations since we were using the kubelet check to point to it instead. If we migrate to using these annotations instead, is there a way of keeping the same metrics format that the kubelet check provides to not have to update any monitors?
I have read all the docs and searched all the open source DD code but could not find any way to enable collection of TCP/UDP state metrics. We are interested in getting number of incoming connections in a pod. I can see the metrics in cadvisor.