cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.43k stars 326 forks source link

Add unix username in event along the UID #2015

Open ycaoT opened 5 months ago

ycaoT commented 5 months ago

Is there an existing issue for this?

Is your feature request related to a problem?

As a security team, we want to know users' activities on a host. However in the current event logs, it only has the user ID, making it hard to trace down to the real user.

Describe the feature you would like

A straightforward way would be to map the user id with the /etc/passwd, and output the username. I've found falco has similar things too, would like to request similar thing. https://falco.org/docs/reference/rules/supported-fields/#field-class-user

Describe your proposed solution

Something like this from falco, https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/user.cpp#L60

Code of Conduct

mtardy commented 5 months ago

Hey, thanks for taking the time to open an issue for this. So there are some technical limitations to this, reading /etc/passwd is not enough quite often, when using NIS or LDAP for example:

The os/user Golang standard library package provides two implementations:

As of now, Tetragon is compiled statically without CGO, making it possible to run in a distroless environment, we could not use the libc implementation without changing that for that specific feature.

On top of that, the Tetragon pod would need to have access to the host /etc/passwd file, so that's another host mount (a touchy one) to add for that feature. When running directly on the host, this problem does not exist.

So, unfortunately, that feature which might look simple at first sight could require some fundamental changes to Tetragon, especially when deploying on Kubernetes.

christian-2 commented 5 months ago

FWIW I'm doing the translation (from uids to username) downstream from, i.e. outside, Tetragon: the Linux kernel does not know about usernames (only uids), so I feel this is the right approach, for Tetragon (since it's based on eBPF) sits fairly close to (and partly inside) the kernel.

mtardy commented 5 months ago

FWIW I'm doing the translation (from uids to username) downstream from, i.e. outside, Tetragon: the Linux kernel does not know about usernames (only uids), so I feel this is the right approach, for Tetragon (since it's based on eBPF) sits fairly close to (and partly inside) the kernel.

Indeed, maybe a script using binaries or an external custom binary (that could be linked against the libc) could be used for those use cases but integrating that directly in Tetragon is challenging if you want to make it right.

mtardy commented 5 months ago

If we think this is the way to go, these limitations and solutions could be documented in tutorials https://tetragon.io/docs/tutorials/.

ycaoT commented 5 months ago

Thank you both. I am opening this mostly for user experience, for security perspective, the username means a lot. If tetragon has such, I am pretty sure it will kill a lot other similar open source products for Observability, especially for security.

christian-2 commented 5 months ago

@ycaoT Would you be content if we follow the tutorial approach, as suggested by @mtardy? I could contribute to that. Let's open an issue (for a future PR) if we are all in agreement.

ycaoT commented 5 months ago

@christian-2 yes, looks good to me!

christian-2 commented 5 months ago

@ycaoT Perfect. @mtardy could you please advise if progress should be tracked in this issue or create a new one, if applicable. Pre-thx.

christian-2 commented 5 months ago

BTW, I'm going thru the O'Reilly report Security Observability with eBPF and it it appears to me that (if if one allows that usernames are a kind of metadata) what we want is conceptually something a bit akin to a "watcher" program as mentioned in the report in relation to a Cloud-native approach to security observability:

eBPF programs enable Kubernetes support by bundling API “watcher” programs that pull identity metadata from the Kubernetes API server and correlate that with container events in the kernel.

mtardy commented 5 months ago

@mtardy could you please advise if progress should be tracked in this issue or create a new one, if applicable. Pre-thx.

By the way, you can already contribute by opening the issue if you want as an enhancement and fill the information, I'll do that later if you need help. https://github.com/cilium/tetragon/issues/new?assignees=&labels=kind%2Fenhancement&projects=&template=feature_request_template.yaml

christian-2 commented 5 months ago

@mtardy Ack, I will (probably at approx. the same time scale as for #2022 and if still needed by then)

ycaoT commented 5 months ago

Closing this, since https://github.com/cilium/tetragon/issues/2030 has been opened.

mtardy commented 2 months ago

I'm reopening this as it contains most of the information on why this is a complex issue, I will redirect other issues to this one with the context.

anfedotoff commented 2 months ago

Username is useful when tetragon works on host. On different hosts the same username can have different UIDs. So username will help to identify the actual user without access to the host. If we consider the use case when Tetragon runs on host than os/user pure go implementation looks good to me. We can limit username extraction only for this case. For example using some flag, or maybe checking that we are not in k8s environment somehow. @mtardy what do you think?

mtardy commented 2 months ago

For example using some flag, or maybe checking that we are not in k8s environment somehow. @mtardy what do you think?

could be possible, however, you would need to subscribe to the change of the /etc/passwd file. This use case is so specific that it might be easier in this situation to do some third-party post-processing on the event. It will be racy as well in both situations!