UpCloudLtd / upcloud-csi

Container Storage Interface (CSI) driver for UpCloud's MaxIOPS storage offering
MIT License
13 stars 0 forks source link

Driver mounts wrong host device #40

Closed wvh closed 1 year ago

wvh commented 2 years ago

Hello,

We've been testing the CSI driver a few weeks ago (main branch) and have twice run into the problem of the driver mounting the wrong host device. After restarting either pods or the driver itself, the CSI driver mounts the host system volume into the pod instead of the proper volume (!).

Depending on the situation, this has either destroyed the host file system or changed its permissions all over the file system tree.

Here is an example of a shell inside a Minio pod that is supposed to have bucket data in /data, but it has the host system root file system mounted instead:

bash-4.4$ ls
bin  boot  data  dev  etc  home  lib  lib64  licenses  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
bash-4.4$ cat /etc/hostname
minio-0
bash-4.4$ ls /data/
bin   cdrom  etc   lib    lib64   lost+found  mnt  proc  run   snap  sys  usr
boot  dev    home  lib32  libx32  media       opt  root  sbin  srv   tmp  var
bash-4.4$ cat /data/etc/hostname
<anonymized>-client-2

Note that the hostname file in the mounted volume shows the Upcloud virtual server's hostname, so the pod effectively has the wrong device mounted.

After that, the host system ran fsck on reboot with lots of errors and then failed to boot with the following fatal error:

...
Begin: Running /scripts/local-bootom ... done.
Being: Running /scripts/init-bottom ... done.
mount: mounting /sys on /root/sys failed: No such file or directory
mount: mounting /proc on /root/proc failed: No such file or directory
[    3.379544] systemd[1]: Failed to mount sysfs at /sys: No such file or directory
[    3.379887] systemd[1]: Failed to mount proc at /proc: No such file or directory
[!!!!!!] Failed to mount early API filesystems.
[    3.383549] systemd[1]: Freezing execution.

utility3_fs_gone

My guess is that there's something going wrong with partition or device enumeration and the wrong device path is used to mount the volume. Maybe there could be a fail-safe option to refuse mounting the host's root file system, but some Kubernetes DaemonSets might depend on that functionality.

These are Ubuntu 20.04 virtual servers running Kubernetes 1.20 and CSI commit 553d0e47081bb1e3bfb1c19d8143932f7a2e9bf4 (2022-08-25).

wvh commented 2 years ago

Forgot to mention: one of those Minio volumes showed up mounted in a test pod with a supposed-to-be-empty volume a while later, so the confusion concerns really any host device path.

thevilledev commented 2 years ago

Hey @wvh! Thanks for reaching out.

We've done a lot of improvements to volume mounts in the last two releases (0.3.x). Specifically PR #32 which should address this issue. Can you try again with version v0.3.2 and check if you're still experiencing the same issue?

wvh commented 2 years ago

Sure. I will retest this in the next few days and report back.

thevilledev commented 1 year ago

We have not seen this issue happen ever since v0.3.2 so I think it's safe to say we can close this off. Please reach out if you are still experiencing the issue - thanks!