ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.19k stars 527 forks source link

Update modprobe in csi-rbdplugin to support zstd compressed rbd and nbd kernel module #4679

Open 7oku opened 2 weeks ago

7oku commented 2 weeks ago

Describe the bug

We cannot install ceph-csi on Ubuntu 24.04, because the rbd and nbd modules cannot be loaded. Reason is the csi-rbdplugin bundled modprobe version (25) lacks support for zstd compressed kernel modules.

Many distros nowadays ship their kernel modules zstd compressed.

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: Deploy csi-ceph on k8s based on Ubuntu 24.04
  2. See error in csi-rbdplugin:
    rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Exec format error\n"
    driver.go:154] an error (exit status 1) occurred while running modprobe args: [rbd]

Actual results

csi-rbdplugin is a privileged container mounted with mostly host bound resources and is in charge of loading the kernel module on the host for rbd and nbd. This fails, as modprobe bundled in the container is not able to load the modules, which are zstd compressed:

find ./ -type f -name '*rbd*'
./kernel/drivers/block/rbd.ko.zst
modprobe --version
kmod version 25
+XZ +ZLIB +OPENSSL -EXPERIMENTAL

We need a current modprobe version to support current versions of distros:

# cat /etc/issue && modprobe --version
Ubuntu 24.04 LTS \n \l

kmod version 31
+ZSTD +XZ -ZLIB +LIBCRYPTO -EXPERIMENTAL

Expected behavior

modprobe of csi-rbdplugin should be capable of loading zstd compressed kernel modules.

Additional context

https://github.com/ceph/ceph-csi/issues/4610 shows same errors, but seems the module was not even required. Reports of same happening on SuSE, who ship modules also zstd compressed since 2021, can be found on the net.

Currently, only workaround seems to be to load the module with modprobe from the host machine. But a solution to keep csi-rbdplugin maintain this in future would be more of an elegant and permanent solution.

nixpanic commented 2 weeks ago

modprobe and other tools are part of the Ceph container image that is used as a baselayer for the Ceph-CSI container image. As soon as Ceph uses a more recent Linux distribution for their container-image, Ceph-CSI will be able to load the .zst kernel modules too.

nixpanic commented 2 weeks ago

Also note that you should be able to work around this issue by loading the module(s) any other way. A static configuration on the host that loads the modules on boot is likely the simplest.

7oku commented 2 weeks ago

We use ceph-csi in a highly dynamic environment, which spawns k8s worker nodes quickly in different environments. You are right, we can workaround this by fiddling with cloud-init, ansible and stuff.

However, having ceph-csi manage it's required modules itself is an elegant and clean way of doing it IMO. So, I'd rather fix this issue right where it's caused instead of workaround it by external additional solutions.

nixpanic commented 2 weeks ago

We also want to see a fix in the Ceph container-image for this. Once their container-image uses CentOS Stream 9, the issue should be resolved. See ceph/ceph-container#2183 for some progress, hopefully those images become available soon.