canonical / microcloud

Automated private cloud based on LXD, Ceph and OVN
https://microcloud.is
GNU Affero General Public License v3.0
280 stars 43 forks source link

MicroCloud init hangs when Ceph encryption is selected on Ubuntu 22.04 #427

Open mseralessandri opened 5 days ago

mseralessandri commented 5 days ago

microcloud init hangs on Configuring cluster-wide devices ... when Ceph encryption is selected on Ubuntu 22.04. Encryption requires a kernel with dm_crypt enabled which is not available on Ubuntu 22.04. The required steps are detailed in the MicroCloud doc.

However microcloud init should not hang and should exit with an error.

OS Version: Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-119-generic x86_64) Snap versions:

:~$ sudo snap list
Name        Version                    Rev    Tracking       Publisher   Notes
core22      20240904                   1621   latest/stable  canonical✓  base
core24      20240710                   490    latest/stable  canonical✓  base
go          1.23.2                     10730  latest/stable  canonical✓  classic
lxd         git-5d7521a                30585  5.21/edge      canonical✓  in-cohort
microceph   19.2.0~git+snapc76c1f5fe9  1189   latest/edge    canonical✓  in-cohort
microcloud  git-6d5f38f                1007   latest/edge    canonical✓  in-cohort
microovn    24.03.2+snapb5ffb86eb5     602    latest/edge    canonical✓  in-cohort
snapd       2.63                       21759  latest/stable  canonical✓  snapd

How to reproduce: I tried with a single node MicroCloud configuration selecting yes to Do you want to encrypt the selected disks? (yes/no) [default=no]:

:~$ sudo microcloud init
Waiting for LXD to start ...
Do you want to set up more than one cluster member? (yes/no) [default=yes]: no
Select an address for MicroCloud's internal traffic:

 Using address "10.142.21.93" for MicroCloud

Gathering system information ...
Would you like to set up local storage? (yes/no) [default=yes]: yes
Select exactly one disk from each cluster member:

Select which disks to wipe:

 Using "/dev/disk/by-id/virtio-9c6b8b32-0573-4f2e-a" on "maria01" for local storage pool

Would you like to set up distributed storage? (yes/no) [default=yes]: yes
Select from the available unpartitioned disks:

Select which disks to wipe:

Disk configuration does not meet recommendations for fault tolerance. At least 3 systems must supply disks.
Continuing with this configuration will inhibit MicroCloud's ability to retain data on system failure
Change disk selection? (yes/no) [default=yes]: no
 Using 1 disk(s) on "maria01" for remote storage pool

Do you want to encrypt the selected disks? (yes/no) [default=no]: yes
Would you like to set up CephFS remote storage? (yes/no) [default=yes]: yes
What subnet (either IPv4 or IPv6 CIDR notation) would you like your Ceph internal traffic on? [default: 10.142.21.0/24] 
What subnet (either IPv4 or IPv6 CIDR notation) would you like your Ceph public traffic on? [default: 10.142.21.0/24] 
Initializing new services
 Local MicroCloud is ready
 Local LXD is ready
 Local MicroOVN is ready
 Local MicroCeph is ready
Awaiting cluster formation ...
Configuring cluster-wide devices ...
gabrielmougard commented 3 days ago

Having a look at this :)

simondeziel commented 3 days ago

dm-crypt is known to be missing from the linux-image-kvm kernel which is 22.04 specific. On 24.04 kernels or on other 22.04 kernel flavor it's not an issue as the module is more readily available. With 22.04 switching to the linux-image-virtual kernel should fix it.

Getting access to dm-crypt was the reason why in MicroCloud's CI switched to 24.04: https://github.com/canonical/microcloud/commit/cdc23b5ddb240ab09ea3e2c9a050492caf512c4a

tomponline commented 3 days ago

The question is why it hangs rather than error out though

simondeziel commented 3 days ago

The question is why it hangs rather than error out though

Right, I hadn't read the full issue description :/