canonical / microcloud

Automated private cloud based on LXD, Ceph and OVN
https://microcloud.is
GNU Affero General Public License v3.0
260 stars 36 forks source link

Drives available but getting "Insufficient number of disks available to set up distributed storage" #271

Open webdock-io opened 3 months ago

webdock-io commented 3 months ago

Following the guide at: https://canonical-microcloud.readthedocs-hosted.com/en/latest/tutorial/get_started/

We have 2 physical NVME drives attached to each VM directly (so 6 drives total across 3 VMs) and they are definitely showing up OK in the VMs as sdb and sdc:

# ls -lah /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 220 Mar 23 11:54 .
drwxr-xr-x 8 root root 160 Mar 15 16:10 ..
lrwxrwxrwx 1 root root   9 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_nvme1 -> ../../sdb
lrwxrwxrwx 1 root root  10 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_nvme1-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_nvme1-part9 -> ../../sdb9
lrwxrwxrwx 1 root root   9 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_nvme2 -> ../../sdc
lrwxrwxrwx 1 root root  10 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_nvme2-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_nvme2-part9 -> ../../sdc9
lrwxrwxrwx 1 root root   9 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_root -> ../../sda
lrwxrwxrwx 1 root root  10 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_root-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Mar 23 11:54 scsi-0QEMU_QEMU_HARDDISK_lxd_root-part2 -> ../../sda2

But microcloud init just says:

Scanning for eligible servers ...

 Selected "lxdvm2" at "10.1.255.88"
 Selected "lxdvm3" at "10.1.255.168"
 Selected "lxdvm1" at "10.1.255.108"

Insufficient number of disks available to set up distributed storage, skipping at this time
Initializing a new cluster

And skips the ceph setup. How is it checking for available disks and how can this be remedied? We've had zfs pools on these drives before for some other testing but the pools have been destroyed (I didnt do labelclear - maybe that's it?)

Additionally, retrying Microcloud after completing the cluster setup has Microcloud just hanging on the initial "Scanning for eligible servers ..." one would think that it would check if it was already set up and just error out immediately when running it.

I guess I have to tear down the snaps on all VMs and reinstall to try the init operation again?

In general, it would be useful to have information on how to remedy the situation for each step in the Microcloud wizard if something goes wrong or you accidentally make some mistake - even if that remedy is just the easiest steps to clean the already set up systems to you can start over

webdock-io commented 3 months ago

I guess I had already guessed the remedy - I did a zpool labelclear AND a wipefs on those drives to be sure, tore down everything and started over and now I was able to select drives.

It would be good to abort at the disk step if no disks are "found" by Microcloud maybe with an explanation - this is just a suggestion to improving the wizard, anyway it would have saved me some time :)

Looking forward to playing with this.!

roosterfish commented 3 months ago

That is right. You have to reinstall the snaps and start all over.

As reported in https://github.com/canonical/microcloud/issues/142 MicroCloud currently doesn't pick up non pristine disks for distributed storage. This depends on https://github.com/canonical/microceph/issues/251 in MicroCeph.