aquasecurity / tracee

Linux Runtime Security and Forensics using eBPF
https://aquasecurity.github.io/tracee/latest
Apache License 2.0
3.64k stars 419 forks source link

tracee event context has type overflows #3690

Open rafaeldtinoco opened 1 year ago

rafaeldtinoco commented 1 year ago

Description

While aware of https://github.com/aquasecurity/tracee/issues/2870, we still need to support/fix current tracee event context, as it has data types overflows that are likely causing context to be lost (all uint32 values, from the eBPF context, might be overflowed when decoded into Event struct, such as the PIDNS value, declared as int, which defaults to very high numbers in my env).

The new event.proto takes care of correct types (using google.protobuf.UInt32Value in the right places, kudos to @josedonizetti).

More details

While playing with some marshalling and unmarshalling ideas (for other output printers, for example), I got the following error after a event -> json -> parquet conversion:

{
  "level": "error",
  "ts": 1700017058.446953,
  "msg": "Error encoding event to parquet",
  "error": "integer overflow on token 4026531841",
  "event": {
    "timestamp": 1700017058317424171,
    "threadStartTime": 1699918738371868249,
    "processorId": 7,
    "processId": 463085,
    "cgroupId": 4794,
    "threadId": 463085,
    "parentProcessId": 462937,
    "hostProcessId": 463085,
    "hostThreadId": 463085,
    "hostParentProcessId": 462937,
    "userId": 1000,
    "mountNamespace": 4026531841,
    "pidNamespace": 4026531836,
    "processName": "code",
    "executable": {
      "path": ""
    },
    "hostName": "rugged",
    "containerId": "",
    "container": {},
    "kubernetes": {},
    "eventId": "257",
    "eventName": "openat",
    "matchedPolicies": [
      ""
    ],
    "argsNum": 4,
    "returnValue": 45,
    "syscall": "openat",
    "stackAddresses": null,
    "contextFlags": {
      "containerStarted": false,
      "isCompat": false
    },
    "threadEntityId": 1656711223,
    "processEntityId": 1656711223,
    "parentEntityId": 723639701,
    "args": [
      {
        "name": "dirfd",
        "type": "int",
        "value": -100
      },
      {
        "name": "pathname",
        "type": "const char*",
        "value": "/proc/463548/cmdline"
      },
      {
        "name": "flags",
        "type": "string",
        "value": ""
      },
      {
        "name": "mode",
        "type": "mode_t",
        "value": 0
      }
    ]
  }
}

and:

$ sudo lsns | grep 4026531836
4026531836 pid       316      1 root            /sbin/init

and

11101111111111111111111111111100

isn't quite an unsigned 32 bit overflow but it is a signed 32 bit overflow.

Meaning that a bunch of Event fields that are int should actually be uint32:

type Context struct {
    Ts              uint64
    StartTime       uint64
    CgroupID        uint64
    Pid             uint32
    Tid             uint32
    Ppid            uint32
    HostPid         uint32
    HostTid         uint32
    HostPpid        uint32
    Uid             uint32
    MntID           uint32
    PidID           uint32
    Comm            [16]byte
    UtsName         [16]byte
    Flags           uint32
    LeaderStartTime uint64
    ParentStartTime uint64
    EventID         events.ID // int32
    Syscall         int32
    MatchedPolicies uint64
    Retval          int64
    StackID         uint32
    ProcessorId     uint16
    _               [2]byte // padding
}
type Event struct {
    Timestamp             int          `json:"timestamp"`
    ThreadStartTime       int          `json:"threadStartTime"`
    ProcessorID           int          `json:"processorId"`
    ProcessID             int          `json:"processId"`
    CgroupID              uint         `json:"cgroupId"`
    ThreadID              int          `json:"threadId"`
    ParentProcessID       int          `json:"parentProcessId"`
    HostProcessID         int          `json:"hostProcessId"`
    HostThreadID          int          `json:"hostThreadId"`
    HostParentProcessID   int          `json:"hostParentProcessId"`
    UserID                int          `json:"userId"`
    MountNS               int          `json:"mountNamespace"`
    PIDNS                 int          `json:"pidNamespace"`
    ProcessName           string       `json:"processName"`
    Executable            File         `json:"executable"`
    HostName              string       `json:"hostName"`
    ContainerID           string       `json:"containerId"`
    Container             Container    `json:"container,omitempty"`
    Kubernetes            Kubernetes   `json:"kubernetes,omitempty"`
    EventID               int          `json:"eventId,string"`
    EventName             string       `json:"eventName"`
    PoliciesVersion       uint16       `json:"-"`
    MatchedPoliciesKernel uint64       `json:"-"`
    MatchedPoliciesUser   uint64       `json:"-"`
    MatchedPolicies       []string     `json:"matchedPolicies,omitempty"`
    ArgsNum               int          `json:"argsNum"`
    ReturnValue           int          `json:"returnValue"`
    Syscall               string       `json:"syscall"`
    StackAddresses        []uint64     `json:"stackAddresses"`
    ContextFlags          ContextFlags `json:"contextFlags"`
    ThreadEntityId        uint32       `json:"threadEntityId"`  // thread task unique identifier (*)
    ProcessEntityId       uint32       `json:"processEntityId"` // process unique identifier (*)
    ParentEntityId        uint32       `json:"parentEntityId"`  // parent process unique identifier (*)
    Args                  []Argument   `json:"args"`            // args are ordered according their appearance in the original event
    Metadata              *Metadata    `json:"metadata,omitempty"`
}

The event struct dates from 2020 and since the beginning has uin32 types declared as ints. We probably never payed hard attention because most of the values don't overflow often (and if they do we wouldn't catch in any test I believe). But the default PID namespaces are overflowing in my environments. It happened by accident because Parquet does not support unsigned integers on its data format.

geyslan commented 12 months ago

Is this awaiting any related changes or can I tackle the Event int change to uint32?

rafaeldtinoco commented 12 months ago

Is this awaiting any related changes or can I tackle the Event int change to uint32?

Please do tackle this. Make sure to test it extensively (including internal E2E) because Im a bit afraid of this change. Let me know if you need anything. Thanks a lot!

geyslan commented 12 months ago

After offline discussion we decided to wait the Event be replaced by https://github.com/aquasecurity/tracee/blob/main/api/v1beta1/event.proto#L11 which already tackled the type normalisation. @josedonizetti please refer (close) this in your upcoming work. Thanks.

josedonizetti commented 11 months ago

@geyslan @rafaeldtinoco yeah, I think waiting is for the best now since it seems we will migrate the event structure for the 0.20.0 release, but if we see it is not the case in the next few weeks, this can should indeed be fixed.

geyslan commented 2 months ago

@rscampos let's discuss this later?