Closed emilioalvap closed 2 months ago
Pinging @elastic/uptime (Team:Uptime)
Pinging @gizas @mlunadia @rameshelastic for awareness.
@ChrsMark, I've recorded the heap profiles you requested, it's all in the zip file. CC @andrewvc
Beats initial state:
$ kubectl top pods | grep beat
filebeat-5bdc87b777-6hwcx 1m 45Mi
heartbeat-66b5567fc8-n98qs 2m 38Mi
metricbeat-85cf558f5c-5qxhq 1m 58Mi
After several deployments ( ~10 minutes re-deploying dummy-deployment.yml
every 20-30 secs):
$ kubectl top pods | grep beat
filebeat-5bdc87b777-6hwcx 64m 372Mi
heartbeat-66b5567fc8-n98qs 66m 402Mi
metricbeat-85cf558f5c-5qxhq 107m 411Mi
After ~60 minutes without deployment activity:
$ kubectl top pods | grep beat
filebeat-5bdc87b777-6hwcx 4m 101Mi
heartbeat-66b5567fc8-n98qs 9m 134Mi
metricbeat-85cf558f5c-5qxhq 5m 134Mi
Files:
I'll be removing Team:uptime
assigment since we have agreed a potential solution for this issue is not in the board for us right now. I'll leave it to Team:Cloudnative-Monitoring
to prioritise it.
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale
to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1
.
Thank you for your contribution!
Summary
docker.elastic.co/beats/heartbeat:8.1.0
.Talos v1.1
/K8s v1.23.5
/kubectl v1.22.5
.Related to #31115.
Initial investigation seems to point out there's a memory leak in Heartbeat side of K8s autodiscovery provider. Even when no pods/services are matched by the provider.
Memory allocation grows with each pod deployed until container reaches the specified limit and is forcefully restarted. This issue prevents monitoring a small number of replicas in a cluster with constant activity.
Steps to Reproduce:
Create an initial deployment with:
heartbeat
pod with an autodiscovery provider configured.Here's an example:
To deploy:
Initially there will be no monitors reported in Kibana and hertbeat pod memory will be minimal and stable:
This is because we have specified a condition in autodiscovery provider that will not match any of the containers we will be creating afterwards, so no monitors are generated:
Start generating K8s autodiscovery load by deploying
dummy-deployment.yml
continuously and checking reported memory usage after each deployment:The amount of memory allocated after each deployment is directly influenced by two factors:
dummy-deployment.yml
:replicas: 50
heartbeat
autodiscovery providers configured, in the provided example there's only one but multiple providers cause even greater memory usage.Eventually, once container reaches max. memory allowed, it will be forcefully restarted.
Tip: The provided example configuration has
--httprof
enabled for the container, so we can check memory allocation graphs while it's running by forwarding ports and usingpprof
tool: