c0c0n3 / teadal.proto

Messing around with cloud infra for https://www.teadal.eu.
MIT License
4 stars 1 forks source link

DirectPV discovery timeout on NixOS/Aarch64 #8

Closed c0c0n3 closed 1 year ago

c0c0n3 commented 1 year ago

Can't get DirectPV 4.0.6 to work with K8s 1.27.2 on NixOS/Aarch64. It looks like DirectPV can't discover available drives and eventually times out.

How to reproduce

First off, do a fresh install of our QEMU dev VM for Aarch64. Then create two raw disks

$ qemu-img create -f qcow2 "d1.img.qcow2" 1G
$ qemu-img create -f qcow2 "d2.img.qcow2" 1G

and start the VM with these two additional disks

$ qemu-system-aarch64 \
    -machine virt,gic-version=3 -accel hvf -cpu host -smp 4 -m 8192M \
    -drive if=pflash,format=raw,file=edk2-aarch64-code.fd \
    -drive file=devm.img.raw,format=raw \
    -drive file=d1.img.qcow2,format=qcow2 \
    -drive file=d2.img.qcow2,format=qcow2 \
    -nographic \
    -nic user,hostfwd=tcp::10022-:22,hostfwd=tcp::16443-:6443

Finally install DirectPV using our own manifests for Aarch64

$ kubectl apply -f deployment/mesh-infra/storage/directpv/base.arm64.yaml

Now ask DirectPV to discover any available drives

$ kubectl directpv discover

After a few minutes you should see this error message

ERROR unable to complete the discovery; context deadline exceeded
No drives are available to initialize
c0c0n3 commented 1 year ago

Notice there's no clue in the logs about why DirectPV can't pick up the two drives

$ kubectl -n directpv logs deployment/controller
$ kubectl -n directpv logs daemonset/node-server -c node-server

Also, you still get a timeout even if you try explicitly naming the drives

$ kubectl directpv discover --drives=vd{b...c}

Partitioning the drives has no effect either

$ sudo -i
$ parted -a optimal /dev/vdb -- mklabel gpt
$ parted -a optimal /dev/vdb -- mkpart primary 0% 100%
$ parted -a optimal /dev/vdc -- mklabel gpt
$ parted -a optimal /dev/vdc -- mkpart primary 0% 100%

Ditto for formatting with e.g. ext4.

c0c0n3 commented 1 year ago

Possibly all this has to do w/ replacing the original DirectPV x86 images w/ arm64 ones. In fact, Direct PV 4.0.6 can only generate manifests for x86_84/amd64. So to install it I had to

$ kubectl directpv install -o yaml > base.yaml

then replace the images with (what I thought to be) aarch64 equivalent ones and finally kubectl apply the modified manifest. I saved the modified manifest in base.arm64.yaml, diff it w/ base.yaml to see what changed.

c0c0n3 commented 1 year ago

re: arm64 install, I opened an issue about it in the DirectPV repo:

c0c0n3 commented 1 year ago

So DirectPV does not support ARM64---see https://github.com/minio/directpv/issues/817.

I've deleted the base.arm64.yaml file, but for the record the only things different from base.yaml were the images. I replaced

with, respectively