falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
233 stars 165 forks source link

[BUG] scap bpf engine requires API VERSION major bumps when it shouldn't #1283

Open Andreagit97 opened 1 year ago

Andreagit97 commented 1 year ago

Describe the bug

When the scap bpf engine loads the .o file it scans all elf sections. Right now when the engine finds a new filler section it expects to have the corresponding enum in userspace tables

https://github.com/falcosecurity/libs/blob/dc4ab168074a30a15bbd7308aa7ce5e45c935f25/userspace/libscap/engine/bpf/scap_bpf.c#L596

but this is not always true. Consider this example:

Let's imagine SCAP_MINIMUM_DRIVER_SCHEMA_VERSION is like today:

#define SCAP_MINIMUM_DRIVER_SCHEMA_VERSION PPM_API_VERSION(2, 0, 0)

and we have a driver with SCHEMA VERSION 2.0.0. Now we add a new filler like in this PR https://github.com/falcosecurity/libs/pull/1256 sys_listen_e and we bump the driver SCHEMA VERSION to 2.1.0. When libscap tries to load the new .o with SCHEMA_VERSION 2.1.0 it will fail because it will find a sys_listen_e section not known and will print invalid filler name. For this reason, when we add a new filler we need to bump also a major for the SCHEMA_VERSION until we fix this issue

int prog_id = lookup_filler_id(event);
if(prog_id == -1)
{
    return scap_errprintf(handle->m_lasterr, 0, "invalid filler name: %s", event);
}

this issue happens when we use an old libscap version and a driver with at least one new filler, not a very common case but BTW it is a bug.

How to reproduce it

Build libscap with the commit before this PR (https://github.com/falcosecurity/libs/pull/1256) and build the bpf probe over this PR, you will see the invalid filler name error when you load the bpf probe

poiana commented 9 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 9 months ago

/remove-lifecycle stale

poiana commented 6 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 6 months ago

/remove-lifecycle stale

poiana commented 3 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 3 months ago

/remove-lifecycle stale

poiana commented 5 days ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 4 days ago

/remove-lifecycle stale

Andreagit97 commented 4 days ago

This could be related to this https://github.com/falcosecurity/libs/pull/2068

Andreagit97 commented 4 days ago

It is possible that we will need to add new filler bpf side so we need to bump the right version

Andreagit97 commented 3 hours ago

The solution would be this:

diff --git a/userspace/libscap/engine/bpf/scap_bpf.c b/userspace/libscap/engine/bpf/scap_bpf.c
index 3f7ea6e58..a88aa9515 100644
--- a/userspace/libscap/engine/bpf/scap_bpf.c
+++ b/userspace/libscap/engine/bpf/scap_bpf.c
@@ -569,8 +569,6 @@ static int32_t load_single_prog(struct bpf_engine *handle,
                                BPF_PROGS_TAIL_CALLED_MAX);
                }

-               handle->m_tail_called_fds[handle->m_tail_called_cnt++] = fd;
-
                event += sizeof("filler/") - 1;
                if(*event == 0) {
                        return scap_errprintf(handle->m_lasterr, 0, "filler name cannot be empty");
@@ -578,7 +576,11 @@ static int32_t load_single_prog(struct bpf_engine *handle,

                int prog_id = lookup_filler_id(event);
                if(prog_id == -1) {
-                       return scap_errprintf(handle->m_lasterr, 0, "invalid filler name: %s", event);
+                       // This is possible when we load a new ebpf probe `.o` with an old scap version.
+                       // The ebpf probe defines a filler that is not present in the scap version. This should
+                       // be safe until we keep the old fillers in the probe like we are doing today
+                       close(fd);
+                       return SCAP_SUCCESS;
                } else if(prog_id >= BPF_PROGS_TAIL_CALLED_MAX) {
                        return scap_errprintf(handle->m_lasterr,
                                              0,
@@ -587,6 +589,8 @@ static int32_t load_single_prog(struct bpf_engine *handle,
                                              BPF_PROGS_TAIL_CALLED_MAX);
                }

+               handle->m_tail_called_fds[handle->m_tail_called_cnt++] = fd;
+
                /* Fill the tail table. The key is our filler internal code extracted
                 * from `g_filler_names` in `lookup_filler_id` function. The value
                 * is the program fd.

In this way what we need to do is to bump a MINOR for the SCAP_MINIMUM_DRIVER_SCHEMA_VERSION because old drivers won't be compatible with new libscap versions, but if we do things correctly new drivers should be compatible with old scap versions.

Let's consider this example:

https://github.com/falcosecurity/libs/pull/1256/files -> We added a new filler sys_listen_e

Let's say the driver version is 2.8.0 like in the PR and scap is #define SCAP_MINIMUM_DRIVER_SCHEMA_VERSION PPM_API_VERSION(2, 0, 0) like today.

With the above fix, the old scap should be able to address the case of a new filler sys_listen_e it will simply ignore it and configure the probe to work with the old filler autofill. Of course, we should always remember to keep the old logic so that we can guarantee this behavior.

In the opposite case, the new scap version is no longer compatible with the old drivers:

So the 2 alternatives are:

  1. merging this fix + don't remove the old logic in the drivers + bump the minor version of SCAP_MINIMUM_DRIVER_SCHEMA_VERSION
  2. bump the major of SCHEMA_VERSION

Since we don't add new fillers every day I would go for the 2, it is less risky IMO. WDYT @falcosecurity/libs-maintainers? If we chose number 2 we can just close this issue and remember to bump the major when we add a new filler

FedeDP commented 2 hours ago

to bump the major when we add a new filler

This means that for each new implemented syscall we will have to bump the major :/ while i agree that we won't add new syscalls every day, this somewhat is still a limitation. The bug only happens for the old bpf probe, right? The modern one is always shipped with libscap therefore no problem on that one, while the kmod is not affected.

Since the old bpf probe is going to get killed sooner or later (no ETA of course but this is something we all have in mind), i agree that 2 is the best solution for now.

Andreagit97 commented 2 hours ago

yep, every time we add a new syscall or change a filler we should bump a major.

The bug only happens for the old bpf probe, right?

Unfortunately yes. We can always decide to fallback to solution 1 if we need to do too many major bumps