cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.63k stars 361 forks source link

tetragon_data_events_total fails to show total number of data events in prometheus #2891

Open AshishNaware opened 1 month ago

AshishNaware commented 1 month ago

What happened?

Steps to reproduce:

  1. checkout latest version of tetragon (i tested with 5f6ca6e6c62d39b1a4e5798be065de60e322c5c1)
  2. make kind-setup
  3. Install prometheus operator
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    helm install prometheus prometheus-community/kube-prometheus-stack
  4. Apply service monitoring
    kubectl apply -f - <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    name: my-service-monitor
    namespace: tetragon
    labels:
    release: prometheus
    spec:
    selector:
    matchLabels:
      app.kubernetes.io/instance: tetragon
    endpoints:
    - port: metrics
      interval: 30s
      path: /metrics
    EOF
  5. Install tetragon demo app
    kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.15.3/examples/minikube/http-sw-app.yaml

    6.port-forward and wait until service monitor my-service-monitor target is up (2/2)

    kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090
  6. Trigger execution event in separate terminal
    kubectl exec -ti xwing -- bash -c 'curl https://ebpf.io/applications/#tetragon'

Expected Behaviour: tetragon_data_events_total should show up some data. Screenshot from 2024-09-08 13-49-34

Actual Behaviour: In most iterations, the data does not show up in the graph. Screenshot from 2024-09-08 09-59-07

Tetragon Version

CLI version: v1.2.0-pre.0-555-g198fd5f5d

Kernel Version

Linux ashish-ubuntu 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.30.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.3 WARNING: version difference between client (1.30) and server (1.27) exceeds the supported minor version skew of +/-1

Bugtool

time="2024-09-08T13:07:20-07:00" level=warning msg="failed to open file" infoFile=/var/run/tetragon/tetragon-info.json

Relevant log output

No response

Anything else?

Kind version: kind v0.20.0 go1.20.4 linux/amd64 Docker version:

Client: Docker Engine - Community
 Version:           26.1.4
 API version:       1.45
 Go version:        go1.21.11
 Git commit:        5650f9b
 Built:             Wed Jun  5 11:28:57 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.1.4
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       de5c9cf
  Built:            Wed Jun  5 11:28:57 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.33
  GitCommit:        d2d58213f83a351ca8f528a95fbd145f5654e957
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
AshishNaware commented 1 month ago

P.S. - I am not super clear with what exactly triggers data events in tetragon. Above bug is based on assumption that every process exec triggers the data event. I can see the process details in the logs -

{
    "process_exec": {
        "process": {
            "exec_id": "dGV0cmFnb24tZGV2LWNvbnRyb2wtcGxhbmU6Mzk3MTk4OTg3ODg0NjoxMDY4MTE=",
            "pid": 106811,
            "uid": 0,
            "cwd": "/",
            "binary": "/usr/bin/curl",
            "arguments": "https://ebpf.io/applications/#tetragon",
            "flags": "execve rootcwd",
            "start_time": "2024-09-08T22:26:35.663607408Z",
            "auid": 4294967295,
            "pod": {
                "namespace": "default",
                "name": "xwing",
                "container": {
                    "id": "containerd://1cd6bab335d68d3c7573ad035b69a5a386141ef8e15d7844590e6bb4ef9727e4",
                    "name": "spaceship",
                    "image": {
                        "id": "quay.io/cilium/json-mock@sha256:5aad04835eda9025fe4561ad31be77fd55309af8158ca8663a72f6abb78c2603",
                        "name": "sha256:adcc2d0552708b61775c71416f20abddad5fd39b52eb4ac10d692bd19a577edb"
                    },
                    "start_time": "2024-09-08T22:25:12Z",
                    "pid": 24
                },
                "pod_labels": {
                    "app.kubernetes.io/name": "xwing",
                    "class": "xwing",
                    "org": "alliance"
                },
                "workload": "xwing",
                "workload_kind": "Pod"
            },
            "docker": "1cd6bab335d68d3c7573ad035b69a5a",
            "parent_exec_id": "dGV0cmFnb24tZGV2LWNvbnRyb2wtcGxhbmU6Mzk3MTk4Nzc4MzA3NDoxMDY4MTE=",
            "tid": 106811
        },
        "parent": {
            "exec_id": "dGV0cmFnb24tZGV2LWNvbnRyb2wtcGxhbmU6Mzk3MTk4Nzc4MzA3NDoxMDY4MTE=",
            "pid": 106811,
            "uid": 0,
            "cwd": "/",
            "binary": "/usr/bin/bash",
            "arguments": "-c \"curl https://ebpf.io/applications/#tetragon\"",
            "flags": "execve rootcwd clone",
            "start_time": "2024-09-08T22:26:35.661512099Z",
            "auid": 4294967295,
            "pod": {
                "namespace": "default",
                "name": "xwing",
                "container": {
                    "id": "containerd://1cd6bab335d68d3c7573ad035b69a5a386141ef8e15d7844590e6bb4ef9727e4",
                    "name": "spaceship",
                    "image": {
                        "id": "quay.io/cilium/json-mock@sha256:5aad04835eda9025fe4561ad31be77fd55309af8158ca8663a72f6abb78c2603",
                        "name": "sha256:adcc2d0552708b61775c71416f20abddad5fd39b52eb4ac10d692bd19a577edb"
                    },
                    "start_time": "2024-09-08T22:25:12Z",
                    "pid": 24
                },
                "pod_labels": {
                    "app.kubernetes.io/name": "xwing",
                    "class": "xwing",
                    "org": "alliance"
                },
                "workload": "xwing",
                "workload_kind": "Pod"
            },
            "docker": "1cd6bab335d68d3c7573ad035b69a5a",
            "parent_exec_id": "dGV0cmFnb24tZGV2LWNvbnRyb2wtcGxhbmU6Mzk3MTk1ODAxOTUzNzoxMDY4MDI=",
            "tid": 106811
        }
    },
    "node_name": "tetragon-dev-control-plane",
    "time": "2024-09-08T22:26:35.663606594Z"
}