falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
212 stars 158 forks source link

Build regression: build fails on 32-bit architectures #1203

Open dkogan opened 11 months ago

dkogan commented 11 months ago

Hello.

I'm the Debian maintainer for these libraries. I just uploaded packages for version 0.11.3, and I'm observing that they fail to build on Linux on 32-bit architectures (i386, armel, armhf, ...). This used to work.

A full build log showing the issue: https://salsa.debian.org/debian/falcosecurity-libs/-/jobs/4435012

The punchline is this:

$ cc -DHAS_CAPTURE -DHAS_ENGINE_NODRIVER -DHAS_ENGINE_NOOP -DHAS_ENGINE_SAVEFILE -DHAS_ENGINE_SOURCE_PLUGIN -DHAS_ENGINE_TEST_INPUT -DHAS_ENGINE_UDIG -DHAVE_SYS_SYSMACROS_H -DPLATFORM_NAME=\"Linux\" -DSCAP_HOSTNAME_ENV_VAR=\"SCAP_HOSTNAME\" -DSCAP_HOST_ROOT_ENV_VAR_NAME=\"HOST_ROOT\" -DSCAP_KERNEL_MODULE_NAME=\"scap\" -D__STDC_FORMAT_MACROS -Dscap_engine_udig_EXPORTS -I/usr/include/gtest -I/builds/debian/falcosecurity-libs/debian/output/source_dir/obj-i686-linux-gnu/common -I/builds/debian/falcosecurity-libs/debian/output/source_dir/userspace/libscap/../common -I/builds/debian/falcosecurity-libs/debian/output/source_dir/userspace/libscap -I/builds/debian/falcosecurity-libs/debian/output/source_dir/obj-i686-linux-gnu/driver/src -I/builds/debian/falcosecurity-libs/debian/output/source_dir/userspace/libscap/linux -g -O2 -ffile-prefix-map=/builds/debian/falcosecurity-libs/debian/output/source_dir=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -ggdb -fPIC -MD -MT libscap/engine/udig/CMakeFiles/scap_engine_udig.dir/scap_udig.c.o -MF CMakeFiles/scap_engine_udig.dir/scap_udig.c.o.d -o CMakeFiles/scap_engine_udig.dir/scap_udig.c.o -c /builds/debian/falcosecurity-libs/debian/output/source_dir/userspace/libscap/engine/udig/scap_udig.c

In file included from /builds/debian/falcosecurity-libs/debian/output/source_dir/driver/syscall_table64.c:34:
/builds/debian/falcosecurity-libs/debian/output/source_dir/driver/syscall_table.c:30:10: error: '__NR_close' undeclared here (not in a function)
   30 |         [__NR_close - SYSCALL_TABLE_ID0] =                      {UF_USED | UF_NEVER_DROP, PPME_SYSCALL_CLOSE_E, PPME_SYSCALL_CLOSE_X, PPM_SC_CLOSE},
      |          ^~~~~~~~~~

The issue is probably related to this chunk of userspace/libscap/CMakeLists.txt:

if(CMAKE_SYSTEM_NAME MATCHES "Linux")
    add_library(driver_event_schema
        ../../driver/syscall_table64.c
        ../../driver/fillers_table.c)
    target_link_libraries(scap_event_schema driver_event_schema)
    set_scap_target_properties(scap_event_schema)
endif()

Note that it unconditionall pulls in syscall_table64.c even on 32-bit arches. Previously this used syscall_table.c, and it worked on the 32-bit arches also. I don't know if simply making that change will break something, and I haven't tried it.

Thanks

FedeDP commented 11 months ago

Hi! Note please that 32bit architectures are not supported (see https://github.com/falcosecurity/libs#drivers-officially-supported-architectures). So, the fact that they were building, did not mean that they were correctly working (mostly drivers would have issues for sure).

dkogan commented 11 months ago

OK...

sysdig has supported 32-bit architectures since the start (2014?). Are you telling me that this support has been officially dropped? It has worked just fine at least on armhf (personal experience) for many years. And I don't recall seeing any bug reports on Debian about any of the many architectures supported there.

So. When did this break? The most recent working package was the git as of 20220316: e5c53d64

Is it worth the time for me to try to fix it myself, or would you say that it's now unfixable?

FedeDP commented 11 months ago

sysdig has supported 32-bit architectures since the start (2014?).

This is a news for me; for example, arm64 support came late 2022 (and arm32 was never supported, neither is now). Perhaps it used to work on x86_32 with kmod (bpf was not working for sure), but we broke it somewhere? Given that we do not officially support these 32bits archs, i think their support needs to be added.

It has worked just fine at least on armhf (personal experience) for many years.

Using which driver? Kmod?

Also cc @gnosek, my favorite repo historian :smile:

dkogan commented 11 months ago

Federico Di Pierro @.***> writes:

sysdig has supported 32-bit architectures since the start (2014?).

This is a news for me; for example, arm64 support came late 2022 (and arm32 was never supported, neither is now). Perhaps it used to work on x86_32 with kmod (bpf was not working for sure), but we broke it somewhere? Given that we do not officially support these 32bits archs, i think their support needs to be added.

I'm so confused now. I did the initial armhf "port" myself years ago. It wasn't really a port: just needed to clean up some stuff:

https://github.com/draios/sysdig/commit/2c2ae3cd9eb4c761caf365e638493b6befbf336a https://github.com/draios/sysdig/commit/fe3adcd1624aefccfe0b988eb9e21ff3d1adebc2

and a few others. The patches were accepted. I was thanked.

It has worked just fine at least on armhf (personal experience) for many years.

Admittedly I haven't actually used this consistently on 32-bit platforms in a little while, but it at least built successfully on Debian up until this last release I packaged yesterday.

Using which driver? Kmod?

The last time I touched the driver code was in 2014. It was "the driver".

dkogan commented 11 months ago

Here's an even better link:

https://github.com/draios/sysdig/issues/41

FedeDP commented 11 months ago

I see now; so that was kmod only, we did never bother to actually support it in all 3 drivers. Since it is not an officially supported architecture, i will put this in the libs-backlog milestone.

Note that it unconditionall pulls in syscall_table64.c even on 32-bit arches

Well syscall_table64 is just pulling __NR_ defines from unistd.h, therefore on a 32bit system it will pull correct syscall numbers (even the syscall_table64 name suggests that we just support 64bit systems :D ).

Thank you for taking the time to open this issue btw! You allowed me to discover something new :smile:

/milestone libs-backlog

FedeDP commented 11 months ago

/milestone driver-backlog

dkogan commented 11 months ago

Hi. Thanks for replying. I'm not clear on the next step though. To get this fixed in Debian, should I wait for you to patch it? I can also play with it myself. Should I do that?

FedeDP commented 11 months ago

Of course you are warmly welcomed to play with it! Indeed it would be great! Nowadays though, we have got 3 drivers, and they should have feature parity; therefore i think the issue is much bigger than it seems :)

dkogan commented 11 months ago

OK. I'll look at it. I took a glance at the docs, and I see the 3 different drivers. Can you say a few words about the pros/cons of each? Is the only advantage of the bpf backends that they don't require a kernel module? Or do they also work better/differently in some way?

FedeDP commented 11 months ago

Mostly:

LocutusOfBorg commented 10 months ago
--- falcosecurity-libs-0.11.3+repack.orig/driver/syscall_table64.c
+++ falcosecurity-libs-0.11.3+repack/driver/syscall_table64.c
@@ -23,6 +23,8 @@ or GPL2.txt for full copies of the licen
 #include "syscall_compat_aarch64.h"
 #elif defined(__s390x__)
 #include "syscall_compat_s390x.h"
+#else
+#include <asm/unistd.h>
 #endif /* __x86_64__ */
 #elif defined(_MSC_VER)
 // these are Linux syscall numbers and obviously meaningless for Windows/macOS
--- falcosecurity-libs-0.11.3+repack.orig/test/drivers/test_suites/syscall_exit_suite/socketcall_x.cpp
+++ falcosecurity-libs-0.11.3+repack/test/drivers/test_suites/syscall_exit_suite/socketcall_x.cpp
@@ -3135,6 +3135,7 @@ TEST(SyscallExit, socketcall_sendX)
    args[2] = data_len;
    args[3] = (unsigned long)flags;
    assert_syscall_state(SYSCALL_FAILURE, "send", syscall(__NR_socketcall, SYS_SEND, args));
+   int64_t errno_value = -errno;

    /*=============================== TRIGGER SYSCALL ===========================*/

--- falcosecurity-libs-0.11.3+repack.orig/test/drivers/test_suites/syscall_exit_suite/send_x.cpp
+++ falcosecurity-libs-0.11.3+repack/test/drivers/test_suites/syscall_exit_suite/send_x.cpp
@@ -12,7 +12,8 @@ TEST(SyscallExit, sendX_fail)
    int32_t mock_fd = -1;
    const unsigned data_len = DEFAULT_SNAPLEN * 2;
    char buf[data_len] = "some-data";
-   int flags = 0 assert_syscall_state(SYSCALL_FAILURE, "send", syscall(__NR_send, mock_fd, (void *)buf, data_len, flags));
+   int flags = 0;
+   assert_syscall_state(SYSCALL_FAILURE, "send", syscall(__NR_send, mock_fd, (void *)buf, data_len, flags));
    int errno_value = -errno;

    /*=============================== TRIGGER SYSCALL ===========================*/

This is enough to make falcosecurity-libs - 0.11.3+repack-1 build successfully @dkogan is it enough for you? https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/locutusofborg-ppa/+sourcepub/15141432/+listing-archive-extra

dkogan commented 10 months ago

Thanks for that patch. I tried it: it was necessary, but not nearly sufficient. You also need at least these:

I just uploaded -4 that has all these. Let's see how it does

LocutusOfBorg commented 10 months ago

thanks! I was exactly trying to find a good way to solve the "fix-build-duplicate-socket-definitions.patch". And the bug-fix-added-missing-semicolon was already mentioned by me :)

We still have autopkgtest regressions: https://autopkgtest.ubuntu.com/results/autopkgtest-mantic/mantic/arm64/f/falcosecurity-libs/20230905_232503_e8b41@/log.gz

396s make: Entering directory '/usr/src/linux-headers-6.3.0-7-generic'
396s warning: the compiler differs from the one used to build the kernel
396s   The kernel was built by: aarch64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1) 12.3.0
396s   You are using:           gcc-12 (Ubuntu 12.3.0-8ubuntu1) 12.3.0
396s   CC [M]  /var/lib/dkms/scap/0.11.3/build/main.o
396s   CC [M]  /var/lib/dkms/scap/0.11.3/build/dynamic_params_table.o
396s   CC [M]  /var/lib/dkms/scap/0.11.3/build/fillers_table.o
396s   CC [M]  /var/lib/dkms/scap/0.11.3/build/flags_table.o
396s   CC [M]  /var/lib/dkms/scap/0.11.3/build/ppm_events.o
396s   CC [M]  /var/lib/dkms/scap/0.11.3/build/ppm_fillers.o
396s In file included from ./include/linux/sched/signal.h:10,
396s                  from ./include/linux/rcuwait.h:6,
396s                  from ./include/linux/percpu-rwsem.h:7,
396s                  from ./include/linux/fs.h:33,
396s                  from ./include/linux/compat.h:17,
396s                  from /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:12:
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c: In function ‘f_sched_prog_exec’:
396s ./include/linux/cred.h:379:23: error: passing argument 1 of ‘inode_permission’ from incompatible pointer type [-Werror=incompatible-pointer-types]
396s   379 |         current_cred()->xxx;                    \
396s       |                       ^
396s       |                       |
396s       |                       struct user_namespace *
396s ./include/linux/cred.h:396:34: note: in expansion of macro ‘current_cred_xxx’
396s   396 | #define current_user_ns()       (current_cred_xxx(user_ns))
396s       |                                  ^~~~~~~~~~~~~~~~
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7463:59: note: in expansion of macro ‘current_user_ns’
396s  7463 |                         exe_writable |= (inode_permission(current_user_ns(), file_inode(exe_file), MAY_WRITE) == 0);
396s       |                                                           ^~~~~~~~~~~~~~~
396s ./include/linux/fs.h:2476:22: note: expected ‘struct mnt_idmap *’ but argument is of type ‘struct user_namespace *’
396s  2476 | int inode_permission(struct mnt_idmap *, struct inode *, int);
396s       |                      ^~~~~~~~~~~~~~~~~~
396s ./include/linux/cred.h:379:23: error: passing argument 1 of ‘inode_owner_or_capable’ from incompatible pointer type [-Werror=incompatible-pointer-types]
396s   379 |         current_cred()->xxx;                    \
396s       |                       ^
396s       |                       |
396s       |                       struct user_namespace *
396s ./include/linux/cred.h:396:34: note: in expansion of macro ‘current_cred_xxx’
396s   396 | #define current_user_ns()       (current_cred_xxx(user_ns))
396s       |                                  ^~~~~~~~~~~~~~~~
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7464:64: note: in expansion of macro ‘current_user_ns’
396s  7464 |                         exe_writable |= inode_owner_or_capable(current_user_ns(), file_inode(exe_file));
396s       |                                                                ^~~~~~~~~~~~~~~
396s ./include/linux/fs.h:1607:47: note: expected ‘struct mnt_idmap *’ but argument is of type ‘struct user_namespace *’
396s  1607 | bool inode_owner_or_capable(struct mnt_idmap *idmap,
396s       |                             ~~~~~~~~~~~~~~~~~~^~~~~
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7516:59: error: ‘kernel_cap_t’ has no member named ‘cap’
396s  7516 |         cap_inheritable = ((uint64_t)cred->cap_inheritable.cap[1] << 32) | cred->cap_inheritable.cap[0];
396s       |                                                           ^
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7516:97: error: ‘kernel_cap_t’ has no member named ‘cap’
396s  7516 |         cap_inheritable = ((uint64_t)cred->cap_inheritable.cap[1] << 32) | cred->cap_inheritable.cap[0];
396s       |                                                                                                 ^
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7517:55: error: ‘kernel_cap_t’ has no member named ‘cap’
396s  7517 |         cap_permitted = ((uint64_t)cred->cap_permitted.cap[1] << 32) | cred->cap_permitted.cap[0];
396s       |                                                       ^
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7517:91: error: ‘kernel_cap_t’ has no member named ‘cap’
396s  7517 |         cap_permitted = ((uint64_t)cred->cap_permitted.cap[1] << 32) | cred->cap_permitted.cap[0];
396s       |                                                                                           ^
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7518:55: error: ‘kernel_cap_t’ has no member named ‘cap’
396s  7518 |         cap_effective = ((uint64_t)cred->cap_effective.cap[1] << 32) | cred->cap_effective.cap[0];
396s       |                                                       ^
396s /var/lib/dkms/scap/0.11.3/build/ppm_fillers.c:7518:91: error: ‘kernel_cap_t’ has no member named ‘cap’
396s  7518 |         cap_effective = ((uint64_t)cred->cap_effective.cap[1] << 32) | cred->cap_effective.cap[0];
396s       |                                                                                           ^
396s cc1: some warnings being treated as errors
396s make[1]: *** [scripts/Makefile.build:260: /var/lib/dkms/scap/0.11.3/build/ppm_fillers.o] Error 1
396s make: *** [Makefile:2030: /var/lib/dkms/scap/0.11.3/build] Error 2
396s make: Leaving directory '/usr/src/linux-headers-6.3.0-7-generic'
396s ====================
396s scap/0.11.3/build/make.logI: Summary:
396s I: FAIL 6.3.0-7-generic
396s 
397s autopkgtest [23:24:46]: test dkms-autopkgtest: -----------------------]
397s autopkgtest [23:24:46]: test dkms-autopkgtest:  - - - - - - - - - - results - - - - - - - - - -
397s dkms-autopkgtest     FAIL non-zero exit status 1
398s autopkgtest [23:24:47]: @@@@@@@@@@@@@@@@@@@@ summary
dkogan commented 10 months ago

Yeah. There're a lot of patches now; I forgot who wrote what.

-4 had a bug, -5 got further, just uploaded -6 that fixes one more thing. Are you maintaining an Ubuntu ppa? Can you build any of the 32-bit arches? As of -5 I still can't build i386,armhf,armel: https://buildd.debian.org/status/package.php?p=falcosecurity-libs

And I'm just talking about building the thing, and passing tests. Being able to build the kernel module is a step I haven't even tried to approach yet

dkogan commented 10 months ago

Alright, -6 has gone through the build cycle: https://buildd.debian.org/status/package.php?p=falcosecurity-libs

It now builds, but the 32-bit arches (i386,armel, armhf) fail the tests, some fail by crashing. I'm going to stop here. Upstream: can you please not abandon these architectures? These all worked for a very long time with the kernel module. The advantages of the bpf-based backends (listed above) aren't super compelling, and using their existence to drop the perfectly-good kmod-based driver just makes no sense. The current patches in my Debian tree are here: https://salsa.debian.org/debian/falcosecurity-libs/-/tree/master/debian/patches

You probably want to merge all, or almost all of them. Some of these fix bugs that are essentially typos; that exist because yall no longer even try to build on most architectures.

LocutusOfBorg commented 10 months ago

This is what I pushed in Ubuntu https://launchpad.net/ubuntu/+source/falcosecurity-libs/0.11.3+repack-5ubuntu3

dkogan commented 10 months ago

Gianfranco Costamagna @.***> writes:

This is what I pushed in Ubuntu https://launchpad.net/ubuntu/+source/falcosecurity-libs/0.11.3+repack-5ubuntu3

OK. Thanks for doing that. It looks like the differences between what you have and what I have are that

Questions:

Also, it looks like you're a DD. Would you like to develop directly on salsa? It looks like there's a bit of duplication of effort happening. I'd be happy to add you to the Maintainers list.

Thanks much!

LocutusOfBorg commented 10 months ago
  • You're ignoring armhf test failures

correct

I just hardcoded the expected types, I don't like the compiler guessing stuff, I think it's saner to hardcode them since they are fixed type pointers.

Questions:

  • Are we certain that the armhf test failures are actually benign and OK to ignore? Is the armhf build working for you?

I don't know, I just did it to check the autopkgtests for that architecture :) Sadly dkms is shown only if the build succeedes

  • You noted that you couldn't actually use dkms to build the kernel modules. Is that still an issue?

yes. only for arm64 https://autopkgtest.ubuntu.com/packages/f/falcosecurity-libs/mantic/arm64

  • For the type-casting patches I only changed what was needed to fix the build. How did you decide to add more casts? Are those fixing the build for you? Tests?

As said, I prefer to make sure all the types are correctly set, not strictly needed, but since they are declared as "static size", this makes the code easier to read, and coherent with all the other testsuite files.

Also, it looks like you're a DD. Would you like to develop directly on salsa? It looks like there's a bit of duplication of effort happening. I'd be happy to add you to the Maintainers list.

Sadly I'm not an user for this package, I'm just checking and trying to make the package build/autopkgtest build to have it in the ubuntu release. I can upload in Debian if you want, but I prefer to just report bugs and have somebody else do the job, because I'm not confident in this codebase

LocutusOfBorg commented 10 months ago

I guess this is the fix we want https://github.com/falcosecurity/libs/pull/1147/files

testing here: https://launchpad.net/ubuntu/+source/falcosecurity-libs/0.11.3+repack-6ubuntu1

dkogan commented 10 months ago

Thanks. I pulled in that patch. My build issues aren't even there yet, though.

FedeDP commented 7 months ago

Sorry for the long time without answer; i think if you can contribute the patches to enable back support for 32bit architectures, we will be good to go (i'd be very happy to review and accept them). We have now an EXPERIMENTAL status for architectures supported by drivers when we cannot guarantee they pass our drivers_test fine (since we don't have CI runners for them), and we accepted kmod-only/ebpf-only patches that enabled support for a new arch in the past (indeed, riscv64 is only supported on kmod at the moment). Most of the time, we have contributors of the new architectures manually testing and ensuring that we don't break on them.

dkogan commented 7 months ago

Hi. I really don't have the cycles to work on this at the moment, and I'm not sure I want to. I fixed this in 2014, and it then worked for a long time. Then I guess you stopped testing and accidentally broke it, or something? If I put in the effort to fix it again, how do I know you won't break it again? I might come back to this eventually. But in the meantime, it would be really nice if you fixed it.

Thanks.

FedeDP commented 7 months ago

Then I guess you stopped testing and accidentally broke it, or something?

Most probably we didn't even know we supported those architectures (and, as previously stated, we reviewed the supported architectures ~1.5yrs ago defining a small set); i think that most people from 2014 are no more contributing to the project since ages, and there was no documentation around supported archs. Also, we never had drivers tests until ~1yr ago, therefore the only way to test was to have access to a machine and manually build + test. Now that we have drivers test, we are only able to automatically run them in our GHA based CI that does only support x86_64. For aarch64, we use a CircleCI job, but for other architectures we are out of luck, that's why we chose to revisit the support architectures set and added also the __EXPERIMENTAL__ support status. We lack the manpower to manually run tests for all existing architectures out there.

Let me state this (once again? Sorry if i repeat myself!) though: the fact that drivers(or just kmod) build on an architecture does not necessarily mean that they are also working fine. See for example our feature_gates.h file.

and I'm not sure I want to.

That's totally understandable, and i am really sorry about that.

But in the meantime, it would be really nice if you fixed it.

Unless we have access somewhere to a machine, we cannot guarantee that everything works fine, and we cannot blindly patch things.

Btw, from the list of patches in https://salsa.debian.org/debian/falcosecurity-libs/-/tree/master/debian/patches:

A patch probably doesn't apply (am not sure though):

Remaining patches would be:

I think the first and the third one apply with no issues; am not sure about the second one and the last one though. Note that we now have a BUILD_SHARED_LIBS switch, that is also tested in CI, to build libs as shared objects linking everything as shared (that's why we don't need the use-shared-libelf.patch anymore).

dkogan commented 7 months ago

Hi. Sorry to be cantankerous. I know we're all doing the best with that we have.

For the architectures I looked at, I see build failures; I'm not even trying to run the code yet. So you can replicate the failures I'm looking at by cross-building, which shouldn't be very effortful to get going. Let me know if you need help doing that, and I should be able to give you detailed notes, when I get around to it.

Thanks for the patch review. I haven't looked at this for a little while now, and presumably yall made changes upstream since then.

FedeDP commented 2 months ago

We haven't been able to tackle this (neither reaching a maintainer's consensus about) during the current release cycle. Moving to the next one: /milestone 0.17.0

FedeDP commented 1 month ago

And again, will move to next milestone: :cry: /milestone 0.18.0