bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.78k stars 519 forks source link

Build a statically linked version of kmod #3981

Closed vigh-m closed 5 months ago

vigh-m commented 5 months ago

Issue number:

Updates #3968

Description of changes: This change builds a statically linked version of kmod. ~This new kmod is unlinked from the existing /bin/kmod and its symlinks. It is installed in /usr/libexec/kmod~

This new version of kmod replaces the existing one. .so files are still provided for dependencies

By providing a statically linked kmod, containers can mount it and load kernel modules without compatibility issues. This is a first step to unblocking #3968

Testing done:

Tested on an admin container

readelf --program-headers --wide /.bottlerocket/rootfs/usr/bin/kmod

Elf file type is EXEC (Executable file) Entry point 0x401cf0 There are 10 program headers, starting at offset 64

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0005e0 0x0005e0 R 0x1000 LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x10e472 0x10e472 R E 0x1000 LOAD 0x110000 0x0000000000510000 0x0000000000510000 0x043b13 0x043b13 R 0x1000 LOAD 0x154a60 0x0000000000555a60 0x0000000000555a60 0x007110 0x00ceb0 RW 0x1000 NOTE 0x000270 0x0000000000400270 0x0000000000400270 0x000040 0x000040 R 0x8 NOTE 0x0002b0 0x00000000004002b0 0x00000000004002b0 0x000044 0x000044 R 0x4 TLS 0x154a60 0x0000000000555a60 0x0000000000555a60 0x000020 0x000060 R 0x8 LOOS+0x474e553 0x000270 0x0000000000400270 0x0000000000400270 0x000040 0x000040 R 0x8 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x154a60 0x0000000000555a60 0x0000000000555a60 0x0055a0 0x0055a0 R 0x1

Section to Segment mapping: Segment Sections... 00 .note.gnu.property .note.gnu.build-id .note.ABI-tag .rela.plt 01 .init .plt .text .fini 02 .rodata rodata.cst32 .eh_frame .gcc_except_table 03 .tdata .ctors .dtors .data.rel.ro .got .got.plt .data .bss 04 .note.gnu.property 05 .note.gnu.build-id .note.ABI-tag 06 .tdata .tbss 07 .note.gnu.property 08 09 .tdata .ctors .dtors .data.rel.ro .got

- [x] Validating behaviour : I'm able to symlink and use the host OS kmod to load and unload the kernel modules. I can also use the kmod in the bottlerocket rootfs directly

[root@admin]# ln -s /.bottlerocket/rootfs/usr/bin/kmod /usr/bin/mo modutil more mount mountpoint [root@admin]# ln -s /.bottlerocket/rootfs/usr/bin/kmod /usr/bin/mo modutil more mount mountpoint [root@admin]# ln -s /.bottlerocket/rootfs/usr/bin/kmod /usr/bin/modprobe [root@admin]# lsmod | grep table iptable_nat 16384 1 nf_nat 57344 2 iptable_nat,xt_MASQUERADE iptable_filter 16384 1 [root@admin]# modprobe ip6table_filter [root@admin]# lsmod | grep table ip6table_filter 16384 0 iptable_nat 16384 1 nf_nat 57344 2 iptable_nat,xt_MASQUERADE iptable_filter 16384 1 [root@admin]# /usr/bin/modprobe -r ip6table_filter [root@admin]# lsmod | grep table iptable_nat 16384 1 nf_nat 57344 2 iptable_nat,xt_MASQUERADE iptable_filter 16384 1 [root@admin]# /.bottlerocket/rootfs/usr/sbin/modprobe ip6table_filter [root@admin]# lsmod | grep table ip6table_filter 16384 0 iptable_nat 16384 1 nf_nat 57344 2 iptable_nat,xt_MASQUERADE iptable_filter 16384 1


- [x] Test with mounts defined in a pod spec: I'm able to link and load modules. 

root@my-pod:/# ln -s /usr/bin/kmod-static /usr/sbin/modprobe root@my-pod:/# ln -s /usr/bin/kmod-static /usr/sbin/lsmod root@my-pod:/# lsmod | grep table nf_tables 307200 0 nfnetlink 20480 2 nf_conntrack_netlink,nf_tables ip6table_filter 16384 1 ip6table_nat 16384 1 iptable_nat 16384 1 nf_nat 57344 4 ip6table_nat,xt_nat,iptable_nat,xt_MASQUERADE ip6table_mangle 16384 1 iptable_mangle 16384 1 iptable_filter 16384 1 root@my-pod:/# modprobe -r nf_tables root@my-pod:/# lsmod | grep table ip6table_filter 16384 1 ip6table_nat 16384 1 iptable_nat 16384 1 nf_nat 57344 4 ip6table_nat,xt_nat,iptable_nat,xt_MASQUERADE ip6table_mangle 16384 1 iptable_mangle 16384 1 iptable_filter 16384 1


Sample pod definition that used

apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers:

Based on the helm charts I see on Cilium's github they do use privileged: true on some of their containers. They also mount the /lib/modules/ in the same way

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.