falcosecurity / driverkit

Kit for building Falco drivers: kernel modules or eBPF probes
Apache License 2.0
64 stars 53 forks source link

Cannot Build CentOS 7: `3.10.0-1160.80.1.el7.x86_64` #236

Closed EXONER4TED closed 1 year ago

EXONER4TED commented 1 year ago

Describe the bug

It looks like CentOS 7 got a new kernel ~Nov. 10 - at least, that's when mirrors picked it up. Our build pipelines internally have been failing for ALL falco libs versions, pointing to the problem being something else. Trying to manually build it gives an error.

Command:

 _output/bin/driverkit docker --output-module test.ko --kernelrelease 3.10.0-1160.80.1.el7.x86_64 --kernelversion 1 --driverversion 3.0.0+driver --target centos --moduledevicename scwx-falco --moduledrivername scwx-falco --kernelurls https://mirrors.edge.kernel.org/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-1160.80.1.el7.x86_64.rpm -l debug

Error:

DEBU !  CC [M]  /tmp/driver/tp_table.o     
DEBU #  LD [M]  /tmp/driver/scwx-falco.o   
DEBU ld: Q/tmp/driver/tp_table.o: invalid string offset 1819042147 >= 21 for section `mtab' 
DEBU /tmp/driver/tp_table.o3: error adding symbols: File format not recognized 
DEBU Lmake[2]: *** [scripts/Makefile.build:470: /tmp/driver/scwx-falco.o] Error 1 
DEBU )make[1]: Leaving directory '/tmp/kernel' 
DEBU :make[1]: *** [Makefile:1316: _module_/tmp/driver] Error 2 
DEBU $make: *** [Makefile:7: all] Error 2  
DEBU log pipe close                                error=EOF

It looks like it's an issue with the linker..?

Either way to get around this problem, I tried creating a custom builder image using CentOS 7 as the base:

FROM centos:7

RUN yum install -y make which binutils gcc && ln -s $(which gcc) /usr/bin/gcc-4.9

And using --builderimage with this image works just fine 🙂

NOTE: GCC in that above image is actually gcc-4.8.5, NOT gcc-4.9 - I just symlink the /usr/bin/gcc to /usr/bin/gcc-4.9 to trick the driverkit GCC selector into using my 4.8.5 - and that works! It gets more interesting... If I change the driverkit GCC selector to force gcc-4.8 for CentOS 3.10 kernels, that also fails... It seems that specifically for this NEW CentOS kernel, it requires gcc-4.8.5?

That, or something else in the build container is mismatched...

What's even MORE interesting is that the other CentOS 7 kernel releases build just fine with the default driverkit container. So something upstream in how CentOS is compiling the new kernel release isn't working...

How to reproduce it Using the current master branch of driverkit, make build, and try to run the following:

 _output/bin/driverkit docker --output-module test.ko --kernelrelease 3.10.0-1160.80.1.el7.x86_64 --kernelversion 1 --driverversion 3.0.0+driver --target centos --moduledevicename scwx-falco --moduledrivername scwx-falco --kernelurls https://mirrors.edge.kernel.org/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-1160.80.1.el7.x86_64.rpm -l debug

Expected behaviour

It should build correctly just like the other CentOS 7 kernel releases without the need for --builderimage.

EXONER4TED commented 1 year ago

@FedeDP - wanted to ping you in case you immediately knew what to do here since you worked with the multi-gcc PR. I'm not sure if we should just make some exception for centos with the GCC selector... or if maybe we need to add gcc-4.8.5 to the buster builder?

dwindsor commented 1 year ago

The buster-based builder uses a more recent binutils than CentOS 7 does, interestingly:

buster-based Builder

root@018e00199b39:/# ld --version
GNU ld (GNU Binutils for Debian) 2.30
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
root@018e00199b39:/# 

centos:7

[root@69c6b754624d /]# ld --version
GNU ld version 2.27-44.base.el7_9.1
Copyright (C) 2016 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
[root@69c6b754624d /]#

I hope we haven't encountered another issue like this (arbitrary binutils versions don't work because they break kernels > x.y.z... or earlier than x.y.x in this case!). It's clear that these problems will continue; we may want to consider moving to per-distro builder containers.

dwindsor commented 1 year ago

NOTE: GCC in that above image is actually gcc-4.8.5, NOT gcc-4.9 - I just symlink the /usr/bin/gcc to /usr/bin/gcc-4.9 to trick the driverkit GCC selector into using my 4.8.5 - and that works! It gets more interesting... If I change the driverkit GCC selector to force gcc-4.8 for CentOS 3.10 kernels, that also fails... It seems that specifically for this NEW CentOS kernel, it requires gcc-4.8.5?

fwiw, I'm able to build this driver in the following environments:

CentOS 7: gcc: 4.8.5, binutils: 2.27-44.base.el7 RHEL 8: gcc: 8.5.0, binutils: 2.30-117.el8 RHEL 9: gcc: 11.3.1, binutils: 2.35.2-24.el8

So, I don't think there's anything special about gcc 4.8.5 that's missing in gcc 4.9 that's causing this.

Interestingly, trying Ubuntu-based --builderimage containers with appropriately recent gcc and ld fails. Using Ubuntu 18, with gcc 7.5.0-3ubuntu1~18.04 and ld 2.3.0:

DEBU Jarch/x86/Makefile:96: stack-protector enabled but compiler support broken 
DEBU warch/x86/Makefile:169: *** CONFIG_RETPOLINE=y, but not supported by the compiler. Compiler update recommended..  Stop. 

I'm sure the gcc in this container has stack-protector and retpoline support, though:

strings /usr/bin/gcc-4.9 |grep -i "\-mindirect\-branch"
mindirect-branch-register
-mindirect-branch=
strings /usr/bin/gcc-4.9 |grep -i "\-stack-protector"
-Wstack-protector                            
-fstack-protector                            
-fstack-protector-all                        
-fstack-protector-explicit                   
-fstack-protector-strong                     
-mstack-protector-guard=

Very interesting is the fact that retpoline support was introduced into gcc in version 4.8.5, the minimum version that seems required to build this driver on CentOS.

For reference, here's a Ubuntu 18 Dockerfile for use with --builderimage:

FROM ubuntu:18.04

RUN apt-get update && \
apt-get install -y gcc curl cpio rpm2cpio make binutils tar && \
ln -s $(which gcc) /usr/bin/gcc-4.9
FedeDP commented 1 year ago

Hi! As always, this is very weird... I agree that, at some point, we might want to add different builders for each distro and distro version (like: centos7 builder, centos8 builder, ubuntu20.04 builder); fact is it's not easy to understand in which distro version a kernelrelease was present (let aside the added complexity of managing 15+ docker images).

Perhaps it would be enough to let's say add builders for distro type, like rpm builders, deb builders (we already have them), and arch builders and whatever we need still. Again, i am not fond of adding that complexity to build let's say 5% more drivers, ie: i don't think the effort is worth the gain!

FedeDP commented 1 year ago

I was trying to leverage #244 to create a new builder image to fix this, but i still end up with:

DEBU 3+ make CC=/usr/bin/gcc-4.8.5 KERNELDIR=/tmp/kernel
DEBU *make -C /tmp/kernel M=/tmp/driver modules
DEBU Jarch/x86/Makefile:96: stack-protector enabled but compiler support broken
DEBU *make[1]: Entering directory `/tmp/kernel'
DEBU )make[1]: Leaving directory `/tmp/kernel'
DEBU warch/x86/Makefile:169: *** CONFIG_RETPOLINE=y, but not supported by the compiler. Compiler update recommended..  Stop.

Using this dockerfile:

FROM centos:7

LABEL maintainer="cncf-falco-dev@lists.cncf.io"

ARG TARGETARCH

RUN yum -y install centos-release-scl && \
    yum -y install gcc \
    llvm-toolset-7.0 \
    bash-completion \
        binutils which make \
    bc \
    ca-certificates \
    curl \
    gnupg2 \
    libc6-dev \
    elfutils-libelf-devel \
    xz \
    cpio \
    flex \
    bison \
    openssl \
    openssl-devel \
    wget

# Properly create soft link
RUN ln -s /usr/bin/gcc /usr/bin/gcc-4.5.8

RUN source scl_source enable llvm-toolset-7.0
RUN echo "source scl_source enable llvm-toolset-7.0" >> /etc/bashrc
RUN source /etc/bashrc

With this gcc:

gcc-4.5.8 -v
Using built-in specs.
COLLECT_GCC=gcc-4.5.8
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Any hint?

FedeDP commented 1 year ago

RUN ln -s /usr/bin/gcc /usr/bin/gcc-4.5.8

Spot the issue :stuck_out_tongue_closed_eyes: Why i am so bad!! ahahah It should be:

RUN ln -s /usr/bin/gcc /usr/bin/gcc-4.8.5

Not 4.5.8 :laughing: It is now fixed.

FedeDP commented 1 year ago

244 fixed this issue ;)

./_output/bin/driverkit docker --output-module test.ko --kernelrelease 3.10.0-1160.80.1.el7.x86_64 --kernelversion 1 --driverversion 3.0.0+driver --target centos --moduledevicename scwx-falco --moduledrivername scwx-falco --kernelurls https://mirrors.edge.kernel.org/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-1160.80.1.el7.x86_64.rpm --builderimage auto:master
INFO driver building, it will take a few seconds   processor=docker
INFO kernel module available                       path=test.ko

/close

poiana commented 1 year ago

@FedeDP: Closing this issue.

In response to [this](https://github.com/falcosecurity/driverkit/issues/236#issuecomment-1434254514): >#244 fixed this issue ;) >``` >./_output/bin/driverkit docker --output-module test.ko --kernelrelease 3.10.0-1160.80.1.el7.x86_64 --kernelversion 1 --driverversion 3.0.0+driver --target centos --moduledevicename scwx-falco --moduledrivername scwx-falco --kernelurls https://mirrors.edge.kernel.org/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-1160.80.1.el7.x86_64.rpm --builderimage auto:master >INFO driver building, it will take a few seconds processor=docker >INFO kernel module available path=test.ko >``` > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.