cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.56k stars 348 forks source link

RSS memory increase on tetragon #2892

Open Jianlin-lv opened 1 week ago

Jianlin-lv commented 1 week ago

What happened?

In my test environment, applied two TracingPolicy and observed an increasing trend in tetragon's consumption of rss memory.

WechatIMG111

Enabled Pprof try to figure out which part consume the most memory Comparison before and after the two samples, the process initProcessInternalExec, tracing. HandleMsgGenericKprobe, namespace. GetMsgNamespaces, Caps. GetCapabilitiesTypes has increased consumption of memory.

I'm not sure if this is the desired behavior or if there is a memory leak.

(pprof) top 20
Showing nodes accounting for 400.50MB, 89.10% of 449.51MB total
Dropped 151 nodes (cum <= 2.25MB)
Showing top 20 nodes out of 122
      flat  flat%   sum%        cum   cum%
   66.01MB 14.68% 14.68%    66.01MB 14.68%  reflect.New
   65.51MB 14.57% 29.26%   164.52MB 36.60%  github.com/cilium/tetragon/pkg/process.initProcessInternalExec
   46.10MB 10.26% 39.51%    48.72MB 10.84%  github.com/cilium/tetragon/pkg/sensors/tracing.handleMsgGenericKprobe
   39.50MB  8.79% 48.30%    39.50MB  8.79%  github.com/cilium/tetragon/pkg/reader/namespace.GetMsgNamespaces
      26MB  5.78% 54.09%       26MB  5.78%  encoding/base64.(*Encoding).EncodeToString
   23.51MB  5.23% 59.31%    23.51MB  5.23%  github.com/cilium/tetragon/pkg/reader/caps.GetCapabilitiesTypes (inline)
   16.74MB  3.72% 63.04%    44.75MB  9.95%  github.com/cilium/tetragon/pkg/eventcache.(*Cache).loop
   14.50MB  3.23% 66.27%    14.50MB  3.23%  google.golang.org/protobuf/internal/impl.mergeInt32Slice
   12.97MB  2.89% 69.15%    15.98MB  3.55%  github.com/cilium/ebpf/btf.readAndInflateTypes
   12.50MB  2.78% 71.93%    12.50MB  2.78%  bufio.(*Scanner).Text (inline)
   11.50MB  2.56% 74.49%    11.50MB  2.56%  google.golang.org/protobuf/types/known/timestamppb.New (inline)
   11.42MB  2.54% 77.03%    11.42MB  2.54%  github.com/cilium/ebpf/btf.indexTypes
      11MB  2.45% 79.48%       11MB  2.45%  github.com/cilium/tetragon/pkg/process.ArgsDecoder
    8.64MB  1.92% 81.40%    20.14MB  4.48%  github.com/cilium/tetragon/pkg/ksyms.NewKsyms
       7MB  1.56% 82.96%        7MB  1.56%  github.com/cilium/tetragon/pkg/grpc/tracing.getKprobeArgument
    6.50MB  1.45% 84.40%    55.72MB 12.40%  github.com/cilium/tetragon/pkg/sensors/tracing.handleGenericKprobe
       6MB  1.33% 85.74%    81.02MB 18.02%  github.com/cilium/tetragon/pkg/process.initProcessInternalClone
    5.57MB  1.24% 86.98%     8.07MB  1.80%  github.com/hashicorp/golang-lru/v2/simplelru..Add
    5.52MB  1.23% 88.21%     5.52MB  1.23%  reflect.mapassign_faststr0
       4MB  0.89% 89.10%    11.50MB  2.56%  github.com/cilium/tetragon/pkg/grpc/tracing.GetProcessKprobe
(pprof) top 20
Showing nodes accounting for 470.40MB, 90.04% of 522.42MB total
Dropped 169 nodes (cum <= 2.61MB)
Showing top 20 nodes out of 103
      flat  flat%   sum%        cum   cum%
   83.51MB 15.98% 15.98%   208.53MB 39.92%  github.com/cilium/tetragon/pkg/process.initProcessInternalExec
   77.51MB 14.84% 30.82%    77.51MB 14.84%  reflect.New
   49.11MB  9.40% 40.22%    52.11MB  9.97%  github.com/cilium/tetragon/pkg/sensors/tracing.handleMsgGenericKprobe
   46.50MB  8.90% 49.12%    46.50MB  8.90%  github.com/cilium/tetragon/pkg/reader/namespace.GetMsgNamespaces
   32.01MB  6.13% 55.25%    32.01MB  6.13%  github.com/cilium/tetragon/pkg/reader/caps.GetCapabilitiesTypes (inline)
   30.50MB  5.84% 61.09%    30.50MB  5.84%  encoding/base64.(*Encoding).EncodeToString
   20.01MB  3.83% 64.92%    20.01MB  3.83%  github.com/cilium/tetragon/pkg/process.ArgsDecoder
      17MB  3.25% 68.17%       17MB  3.25%  google.golang.org/protobuf/internal/impl.mergeInt32Slice
   16.74MB  3.20% 71.38%    48.25MB  9.24%  github.com/cilium/tetragon/pkg/eventcache.(*Cache).loop
      13MB  2.49% 73.87%       13MB  2.49%  google.golang.org/protobuf/types/known/timestamppb.New (inline)
   12.97MB  2.48% 76.35%    15.98MB  3.06%  github.com/cilium/ebpf/btf.readAndInflateTypes
   12.50MB  2.39% 78.74%    12.50MB  2.39%  bufio.(*Scanner).Text (inline)
   11.42MB  2.19% 80.93%    11.42MB  2.19%  github.com/cilium/ebpf/btf.indexTypes
    8.64MB  1.65% 82.58%    20.14MB  3.86%  github.com/cilium/tetragon/pkg/ksyms.NewKsyms
    8.50MB  1.63% 84.21%    99.52MB 19.05%  github.com/cilium/tetragon/pkg/process.initProcessInternalClone
       7MB  1.34% 85.55%        7MB  1.34%  github.com/cilium/tetragon/pkg/grpc/tracing.getKprobeArgument
    6.02MB  1.15% 86.70%     6.02MB  1.15%  reflect.mapassign_faststr0
       6MB  1.15% 87.85%    58.61MB 11.22%  github.com/cilium/tetragon/pkg/sensors/tracing.handleGenericKprobe
       6MB  1.15% 89.00%     7.50MB  1.44%  github.com/cilium/tetragon/pkg/process.getPodInfo
5.46MB  1.04% 90.04%     9.46MB  1.81%  github.com/hashicorp/golang-lru/v2/simplelru..Add

TracingPolicy

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: process-cap-capable
spec:
  kprobes:
  - args:
    - index: 2
      label: capability-to-check
      maxData: false
      returnCopy: false
      type: int
    call: cap_capable
    return: true
    returnArg:
      index: 0
      maxData: false
      returnCopy: false
      type: int
    selectors:
    - matchActions:
      - action: Post
        rateLimit: 5s
      matchReturnArgs:
      - index: 0
        operator: Equal
        values:
        - "0"
    syscall: false
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: file-monitoring-filtered-write
spec:
  kprobes:
  - args:
    - index: 0
      maxData: false
      returnCopy: false
      type: file
    - index: 1
      label: mask-lvlin-WRITE
      maxData: false
      returnCopy: false
      type: int
    call: security_file_permission
    return: true
    returnArg:
      index: 0
      maxData: false
      returnCopy: false
      type: int
    returnArgAction: Post
    selectors:
    - matchActions:
      - action: Post
        rateLimit: 5s
      matchArgs:
      - index: 0
        operator: NotPrefix
        values:
        - /tmp/dummy
      - index: 1
        operator: Equal
        values:
        - "2"
        - "8"
        - "4"
    syscall: false

Tetragon Version

v1.1.2

Kernel Version

ubuntu 22.04 , kernel 5.15.0-26

Kubernetes Version

No response

Bugtool

No response

Relevant log output

No response

Anything else?

No response

mtardy commented 1 week ago

Hello, thanks for the detailed report. The fact that you see a memory increase on loading tracing policies is normal behavior. If you are looking at container_memory_working_set_bytes, depending on what you use for control groups (v1 or v2), it also accounts for the BPF maps memory usage we load from the TracingPolicy, I imagine in your case it might be v2. On the agent side we also need bits of code that allocates memory that you don't use if you don't use any policy.

That said, I've been working on tracking memory consumption of Tetragon and trying to avoid unnecessary memory waste.

On your heap dumps you can see the biggest consumption post is:

github.com/cilium/tetragon/pkg/process.initProcessInternalExec

Which is the process cache, I'm currently working on fixing a potential issue we have there: having a cache that grows too much compared to the actual process running on the host. That might enable Tetragon to overall consume less memory.

e-ngo commented 1 week ago

Hi @mtardy , that's exciting! We're seeing something similar to @Jianlin-lv as well where our memory seems to grow unbounded. Our heap dump does also have the process cache as the largest consumer, and our workloads are ephemeral by nature (lots of pod churns), so I'm quite curious about your point on having a process cache that grows too much. I have some questions regarding your work:

mtardy commented 1 week ago
  • Would / should we reduce the process cache size as a way to limit it from growing too much? My guess is probably not because IF you do have that many processes, you would want to have them in cache...
  • What are some issues you see with the current process cache?

Let me answer both questions here: theoretically, the process cache size should be in line with the number of processes currently running on the host. So a lot of different situations can happen here depending on what you do with your host, but in a general case, it should eventually be stable and pretty low in size.

The issue we see is that in some situations, the process cache fills pretty quickly while the number of processes on the host is under a few hundred. We are currently merging work that will allow us to diagnose on a running agent what's happening in the cache https://github.com/cilium/tetragon/pull/2246.

Eventually, if people have very different needs (running tetragon at scale on a host with hundreds of thousands of processes), we are open to implementing tuning on the sizing of cache and BPF maps.

What would lead to high cache memory? Our memory metrics for tetragon shows high cache memory (2GB+) but relatively low RSS (~500MB). We tried forcing go GCing as well as triggering proc/sys/vm/drop_caches to try to reclaim memory, but cache size remains increasingly high.

Here it depends on what we are talking about exactly. Generally, Tetragon has two big posts of memory consumption if you look at memory.current in mem cgroup v2:

  1. Golang memory heap: here doing pprof memory dump can help to understand what's the biggest consumer (at the moment, in the situation I'm investigating, it's the process cache) but you'll see that usually if GOGC is set at 100 (as it is by default), you'll need twice the memory allocated at a specific moment anyway. So if the agent needs 200MB, the Golang runtime with the GC needs at least 400MB (see more here https://tip.golang.org/doc/gc-guide, new GOMEMLIMIT can also help reduce that amount while keeping performance). What happens in reality is that if your system is not under pressure, some memory might not have been reclaimed by the OS and your total consumption, in my example could be around 500MB. Now, if you are seeing that the heap impact on the system is x4 the actual heap allocation, I think it would be worth investigating what's happening, looking at kernel vs anonymous, then the memory segments, then Go memstat, then a mem profile (I have some notes on understanding memory here).
  2. BPF maps: this is pretty static but increases with policy loaded but should be way more reasonable thanks to many patches merged in 1.12 (search memory on the release page). With many policies loaded, this typically occupies about ~20% of the total memory used (this estimate really depend ds on what you do with Tetragon).