Open swiatekm opened 1 week ago
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
Potential fixes I thought of below. Not sure how feasible they are given the provider architecture, but Vars themselves seem flexible enough to permit them:
@swiatekm You can assume that the data that is passed into the provider is not modified. The reason this code copies is because it ensures that the provider doesn't update the structure after sending to AddOrUpdate
. It would be totally acceptable to lower memory usage to not do this, but providers need to be sure that it doesn't mutate the structure after calling AddOrUpdate
(unless they call AddOrUpdate
after the mutation).
One aspect of this issue is that the Agent Coordinator doesn't tell variable providers which variables it actually wants, which means providers need to collect all supported data even if it's known deterministically that it will never be needed. There's an issue already involving preprocessing variable substitutions in new policies, and a natural feature to include with it is to give providers an explicit list of variables that need to be monitored. In that case, an empty policy that doesn't actually use the kubernetes metadata could avoid querying it in the first place.
One aspect of this issue is that the Agent Coordinator doesn't tell variable providers which variables it actually wants, which means providers need to collect all supported data even if it's known deterministically that it will never be needed. There's an issue already involving preprocessing variable substitutions in new policies, and a natural feature to include with it is to give providers an explicit list of variables that need to be monitored. In that case, an empty policy that doesn't actually use the kubernetes metadata could avoid querying it in the first place.
I think most of this data is actually used by filebeat. See an example filebeat input:
data_stream:
dataset: kubernetes.container_logs
id: kubernetes-container-logs-nginx-d556bf558-vcdpl-4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995
index: logs-kubernetes.container_logs-default
parsers:
- container:
format: auto
stream: all
paths:
- /var/log/containers/*4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995.log
processors:
- add_fields:
fields:
input_id: filestream-container-logs-4b47f8c5-5515-4267-a33d-4fb64806f81c-kubernetes-f483454b-e8f2-42b5-8c22-4da229e86b8a.nginx
target: '@metadata'
- add_fields:
fields:
dataset: kubernetes.container_logs
namespace: default
type: logs
target: data_stream
- add_fields:
fields:
dataset: kubernetes.container_logs
target: event
- add_fields:
fields:
stream_id: kubernetes-container-logs-nginx-d556bf558-vcdpl-4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995
target: '@metadata'
- add_fields:
fields:
id: 5206cacb-562e-46a1-b256-0b6833a0d653
snapshot: false
version: 8.15.0
target: elastic_agent
- add_fields:
fields:
id: 5206cacb-562e-46a1-b256-0b6833a0d653
target: agent
- add_fields:
fields:
id: 4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995
image:
name: nginx:1.14.2
runtime: containerd
target: container
- add_fields:
fields:
cluster:
name: kind
url: kind-control-plane:6443
target: orchestrator
- add_fields:
fields:
container:
name: nginx
labels:
app: nginx
pod-template-hash: d556bf558
namespace: default
namespace_labels:
kubernetes_io/metadata_name: default
namespace_uid: 9df9c3db-a0ca-426d-bbb5-0c63092a39ae
node:
hostname: kind-control-plane
labels:
beta_kubernetes_io/arch: amd64
beta_kubernetes_io/os: linux
kubernetes_io/arch: amd64
kubernetes_io/hostname: kind-control-plane
kubernetes_io/os: linux
node-role_kubernetes_io/control-plane: ""
name: kind-control-plane
uid: 0b6d3cbf-7a86-4775-9351-86f5448c21d8
pod:
ip: 10.244.0.102
name: nginx-d556bf558-vcdpl
uid: f483454b-e8f2-42b5-8c22-4da229e86b8a
replicaset:
name: nginx-d556bf558
target: kubernetes
- add_fields:
fields:
annotations:
elastic_co/dataset: ""
elastic_co/namespace: ""
elastic_co/preserve_original_event: ""
target: kubernetes
- drop_fields:
fields:
- kubernetes.annotations.elastic_co/dataset
ignore_missing: true
when:
equals:
kubernetes:
annotations:
elastic_co/dataset: ""
- drop_fields:
fields:
- kubernetes.annotations.elastic_co/namespace
ignore_missing: true
when:
equals:
kubernetes:
annotations:
elastic_co/namespace: ""
- drop_fields:
fields:
- kubernetes.annotations.elastic_co/preserve_original_event
ignore_missing: true
when:
equals:
kubernetes:
annotations:
elastic_co/preserve_original_event: ""
- add_tags:
tags:
- preserve_original_event
when:
and:
- has_fields:
- kubernetes.annotations.elastic_co/preserve_original_event
- regexp:
kubernetes:
annotations:
elastic_co/preserve_original_event: ^(?i)true$
prospector:
scanner:
symlinks: true
type: filestream
I think most of this data is actually used by filebeat.
This could be true, but one thing that hasn't been obvious to me looking at examples is how much unused metadata implicitly comes with the current approach. E.g. in the example you give, even though it needs different categories of Kubernetes metadata for potentially many containers/pods, all the actual substituted fields together are still quite small, and would still not be a major memory footprint even with hundreds or thousands of pods/containers. (There may be limits on how well we can act on that, though, depending on what's cached by the Kubernetes support libraries themselves.)
Anyway, if at some point there's reason to think that pruning fields would help, it would definitely be a feasible modification for the Agent Coordinator to pass the relevant/minimal field list through to variable providers.
@swiatekm, your comment here https://github.com/elastic/elastic-agent/issues/5835#issuecomment-2432790163 seems to be related to something else https://github.com/elastic/ingest-dev/issues/2454#issuecomment-1737569099 that investigated the impact of map.Clone
in the context of the processors. Taking a look at the history would help IMHO give some context.
@swiatekm, your comment here #5835 (comment) seems to be related to something else elastic/ingest-dev#2454 (comment) that investigated the impact of
map.Clone
in the context of the processors. Taking a look at the history would help IMHO give some context.
To be clear, I wasn't making a statement about the performance of filebeat, just that the generated configuration for it appears to use a lot of the Pod metadata. This issue is about the elastic-agent binary exclusively.
I think the complexity of only providing the fields that are used by the provider might outweigh the memory savings unless we could determine this to be a very large amount. The complexity comes when the policy changes and now a new field is now being referenced but the provider is not providing that variable now because it was omitted from the previous policy. Now the new set of fields need to be sent to the provider and then the provider now needs to update all mappings with that new variable. I think that complexity might outweigh the benefit of such a change.
I think the patch of not cloning the mappings might be a large enough win that the need to omit fields might not be required.
I do think this issue solved https://github.com/elastic/elastic-agent/issues/3609 would be very nice to have done, and I think something that needs to be done relative to Hybrid OTel mode. Because when the agent is only running OTel configuration it should not be running any providers. That would be a memory savings for all agents that are not referencing anything kubernetes related in the policy.
Experimenting with removing var cloning brought me to an unusual discovery: On agent start, we spend a lot of memory on getting the host ip addresses. See these heap profiles:
I don't get why this would be the case. I verified we only call the host provider function once, and it allocating 33 MB while just getting ip addresses from Linux seems very excessive. But we only use Go stdlib functions to do this. Looking at interfaces via shell commands on the Node didn't yield anything unusual. I'll see if I can run a small program to show me if we're maybe iterating over a bunch of junk data - Kubernetes is know to mess with local networking a lot.
EDIT: Looks like this is a kind of N+1 problem with syscalls. The code lists interfaces, and then lists addresses for each interface. The second call is not cheap, so if we have a lot of interfaces on a machine - which happens on a K8s Node with a lot of Pods - it adds up to a fair amount of memory allocations. There is actually a go stdlib function that gets all the addresses in a single syscall, I'll try to use it and see if that helps.
@andrewkroh any quick ideas about how to bring down the memory usage from syscall.NetlinkRIB
coming through go-sysinfo's shared.Network()
implementation? See the profile above in https://github.com/elastic/elastic-agent/issues/5835#issuecomment-2441544789.
@cmacknz https://github.com/elastic/go-sysinfo/pull/246 should already help a lot here.
Nice! How much of an improvement was that out of curiosity?
Nice! How much of an improvement was that out of curiosity?
Around 4x less memory in my environment.
In its default configuration, agent has the kubernetes provider enabled. In DaemonSet mode, this provider keeps track of data about Pods scheduled on the Node the agent is running on. This issue concerns the fact that the agent process itself uses an excessive amount of memory if the number of these Pods is high (for the purpose of this issue, this will mean close to the default Kubernetes limit of 110). This was originally discovered while troubleshooting #4729.
This effect is visible even if we disable all inputs and self-monitoring, leaving agent to run as a single process without any components. This strongly implies it has to do with configuration variable providers. I used this empty configuration in my testing to limit confounding variables from beats, but the effect is more pronounced when components using variables are present in the configuration.
Here's a graph of agent memory consumption as the number of Pods on the Node increases from 10 to 110:
A couple of observations from looking at how configuration changes affect this behaviour:
Test setup
More data
I resized my Deployment a couple times and looked at a heap profile of the agent process:
The churn appears to be coming primarily from needing to recreate all the variables whenever a Pod is updated. The call to
composable.cloneMap
is where we copy data from the host provider.Root cause
The root cause appears to be a combination of behaviours in the variable pipeline: