cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.66k stars 369 forks source link

[metrics] Fix overhead_program metrics for return probes #3074

Closed tpapagian closed 2 weeks ago

tpapagian commented 2 weeks ago

Let's assume the following example:

$ cat pol.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "file-monitoring-mmap"
spec:
  kprobes:
  - call: "security_mmap_file" syscall: false return: true args:
    - index: 0 type: "file" # (struct file *) used for getting the path
    - index: 1 type: "uint32" # the prot flags PROT_READ(0x01), PROT_WRITE(0x02), PROT_EXEC(0x04)
    - index: 2 type: "nop" # the mmap flags (i.e. MAP_SHARED, ...) returnArg: index: 0 type: "int" returnArgAction: "Post" selectors:
    - matchArgs:
      - index: 0 operator: "Prefix" values:
        - "/etc/" # filenames to filter for 
$ sudo ./tetragon --btf /sys/kernel/btf/vmlinux  --bpf-lib ./bpf/objs/ --metrics-server ':2112' --tracing-policy ./pol.yaml  --disable-kprobe-multi

After that, if we try to get the metrics from another terminal we get the following errors:

$ curl http://localhost:2112/metrics
An error has occurred while serving metrics:

2 error(s) occurred:
* collected metric "tetragon_overhead_program_seconds_total" { label:{name:"attach"  value:"security_mmap_file"}  label:{name:"policy"  value:"file-monitoring-mmap"}  label:{name:"policy_namespace"  value:""}  label:{name:"sensor"  value:"generic_kprobe"}  counter:{value:0}} was collected before with the same name and label values
* collected metric "tetragon_overhead_program_runs_total" { label:{name:"attach"  value:"security_mmap_file"}  label:{name:"policy"  value:"file-monitoring-mmap"}  label:{name:"policy_namespace"  value:""}  label:{name:"sensor"  value:"generic_kprobe"}  counter:{value:0}} was collected before with the same name and label values

The issue here, is that we get two metrics withg the same labels. This happens because we need the retprobe as well (i.e. returnArg) and this have the same name as the kprobe.

To fix that we need to add another label for the section that we use to attach. This patch adds that and the example metrics from the previous example are:

tetragon_overhead_program_seconds_total{attach="security_mmap_file",policy="file-monitoring-mmap",policy_namespace="",section="kprobe/generic_kprobe",sensor="generic_kprobe"} 0
tetragon_overhead_program_seconds_total{attach="security_mmap_file",policy="file-monitoring-mmap",policy_namespace="",section="kprobe/generic_retkprobe",sensor="generic_kprobe"} 0

Which reports both the attach function (i.e. security_mmap_file) and the program that we use to attach (i.e. kprobe/generic_kprobe and kprobe/generic_retkprobe).

netlify[bot] commented 2 weeks ago

Deploy Preview for tetragon ready!

Name Link
Latest commit fd2fba3044d0c43a3bc6b16e185afd9977e8c4b8
Latest deploy log https://app.netlify.com/sites/tetragon/deploys/6729275461cfec000957f70a
Deploy Preview https://deploy-preview-3074--tetragon.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.