falcosecurity / falco-website

Source code of the official Falco website
https://falco.org
Creative Commons Attribution 4.0 International
35 stars 220 forks source link

Increase clarity in the API between the kernel space and the user space #1063

Closed LucaGuerra closed 1 year ago

LucaGuerra commented 1 year ago

What to document

The Falco review from the CNCF TOC highlighted the need to better document how the data flows between user space and kernel space. See Emily's comment:

Improve the documentation on the communication APIs and define the data being exchanged. It should be written to address both adopters and potential code contributors. Understanding those APIs and the data that flows through them would allow contributors to comprehend the tool and its potential capabilities far better to extend its use as well as assist adopters in understanding what sorts of information can be extracted and reused within their enterprise architectures for business and threat decisions.

Essentially, some parts of the protocol are implementation details, while some other parts can help adopters and contributors in understanding what kind of information is exchanged. I feel like we already have some of this information in our public reference but our adopters are very technical and need to learn more in order to make informed decisions, while allowing a less steep learning curve for contributors willing to extend the system.

LucaGuerra commented 1 year ago

I thought about how to improve our documentation in this regard. I have a general idea of what to document to be useful for adopters and users, based on what I would like to know as a power user or contributor.

In libscap we have a concept of "scap engines" (or scap_vtable) which is a common interface that all syscall sources use (meaning: kernel module, eBPF, modern eBPF, gVisor ...). For a contributor, I think it would be very useful to understand how this mechanism works because they may want to implement more ways to collect data from a running kernel or understand the existing ones. As an example, a microVM expert might know how to efficiently get syscall data out of those lightweight systems, and if they wish to contribute to Falco they will have documentation that explains how to do so. For a power user, this would serve as a guide to take a look at exactly what happens when the driver is initialized and/or when it starts collecting events because they all go through the same interface.

On the other hand, I thought about documenting the actual low level communication between the kernel and userspace (ioctls and maps) and I couldn't find a real use for it. It'd be very hard to keep updated and maintained for all the syscall sources. In addition, the drivers and engines aren't really designed to work standalone and are working right as a part of libscap even if in two cases (classic ebpf and kmod) they are distributed as a separate file for convenience. However, users should be able to find documentation about what the version numbers mean because they are output by Falco and most likely any other tool that uses the library, and also identify where the actual boundary between user and kernel is located in Falco.

Also, our little tool scap_open could be very interesting for contributors and adopters because:

  1. It serves as a simple and understandable example of libscap use and interaction with the kernel
  2. It allows to dump the raw data coming from the kernel which is what an experienced user wants to see So it should be better documented.
LucaGuerra commented 1 year ago

I was able to discuss the matter with @leogr .

The concept of scap engine (scap_vtable) is not a public API but rather something that we're still evolving. While it is documented in the code, there is a high risk of that documentation becoming stale quickly, and we don't want to mislead our adopters and contributors.

On the other hand, we still want to make it clearer what kind of data is exchanged between the kernel and userspace. To respond to users' and adopters' needs, we identified the following gaps in the documentation, considering we already document the list of events https://falco.org/docs/reference/rules/supported-events/ .

LucaGuerra commented 1 year ago

I am working on this, and the right place for this info is the website so I transferred the issue there.