Closed chrrrles closed 9 months ago
Hi @chrrrles
Looks like Rook is having trouble registering the CSI drivers and nodes? Can you check the output of:
sudo microk8s kubectl get csidrivers
sudo microk8s kubectl get csinodes
Also, retrieving logs from --all-containers
on the rook pods might shed some more light as to what is happening. Can you also check whether the ceph rbd pools are created? Though I imagine the issues are related to CSI instead.
Thanks @neoaggelos - the --all-containers
log flag identified the problem source. The rbd plugin is unable to load the rbd
kernel module:
chrrles@misscompy:~$ kubectl logs csi-rbdplugin-5nvp9 -n rook-ceph --all-containers
...snip...
E0112 01:51:42.259532 42440 rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Exec format error\n"
Manually loading the rbd
module and reloading then results in a warning that the nbd
module cannot be loaded:
chrrles@misscompy:~$ kubectl delete pods csi-rbdplugin-5nvp9 -n rook-ceph
pod "csi-rbdplugin-5nvp9" deleted
chrrles@misscompy:~$ kubectl logs ds/csi-rbdplugin -n rook-ceph --all-containers
W0112 01:57:48.892866 59135 rbd_attach.go:226] nbd modprobe failed (an error (exit status 1) occurred while running modprobe args: [nbd]): "modprobe: ERROR: could not insert 'nbd': Exec format error\n"
chrrles@misscompy:~$ sudo modprobe nbd
chrrles@misscompy:~$ kubectl delete pods/csi-rbdplugin-5n5n9 -n rook-ceph
pod "csi-rbdplugin-5n5n9" deleted
chrrles@misscompy:~$ kubectl logs ds/csi-rbdplugin -n rook-ceph --all-containers
I0112 01:59:58.670031 65930 main.go:167] Version: v2.7.0
I0112 01:59:58.670086 65930 main.go:168] Running node-driver-registrar in mode=registration
I0112 01:59:59.677013 65930 node_register.go:53] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0112 01:59:59.677312 65930 node_register.go:62] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0112 01:59:59.677437 65930 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0112 02:00:00.572303 65930 main.go:102] Received GetInfo call: &InfoRequest{}
I0112 02:00:00.573246 65930 main.go:109] "Kubelet registration probe created" path="/var/snap/microk8s/common/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/registration"
I0112 02:00:00.633474 65930 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
Adding rbd
and nbd
to /etc/modules fixes the issue with mounting the PVC volume. Further research shows this to be a kernel incompatibility preventing the rbd module from being loaded by the rook agent... Which is odd because this kernel should be compatible (https://documentation.suse.com/es-es/ses/7/html/ses-all/admin-caasp-ceph-common-issues.html#solution-7):
chrrles@misscompy:~/ollama$ uname -a
Linux misscompy 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Regardless, this does not feel related to microk8s so I will close this issue. Thanks again for helping @neoaggelos !
:100: :fireworks: :raised_hands:
Thank you @chrrrles , this is not loaded by default on the latest ubuntu
cloud image (noble) either, kernel 6.8+
;
Don't use the /etc/modules
, your solution works. Just to recap I had to run on each Kubernetes node:
sudo modprobe rbd
sudo modprobe nbd
Edit: To make it persistent you need to add to the file as said @chrrrles, the recommended linux way is to have one file per module for some reason:
echo rbd | sudo tee -a /etc/modules-load.d/rbd.conf
echo nbd | sudo tee -a /etc/modules-load.d/nbd.conf
sudo chmod 777 /etc/modules-load.d/rbd.conf
sudo chmod 777 /etc/modules-load.d/nbd.conf
Not sure if you need all 777
permissions, but the deprecated /etc/modules
file has those permissions so I just moved it over, without that the modules were not loaded at boot
Summary
Unable to provision Ceph volumes in microk8s. The ceph-rbd pod is stuck in
CrashLoopBackOff
and is unable to connect to the csi.sock for registration (I presume).PVCs can be created but pods cannot use them.
What Should Happen Instead?
Pods should be able to utilize PVCs provisioned by ceph RBD.
Reproduction Steps
Introspection Report
inspection-report-20240110_130742.tar.gz
Can you suggest a fix?
:man_shrugging: I wish I could suggest a fix
Are you interested in contributing with a fix?
Yes, with some guidance. :+1: