canonical / microceph

Ceph for a one-rack cluster and appliances
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
193 stars 25 forks source link

MicroCeph fails on machine with device mapper multipathing #375

Open fnordahl opened 3 days ago

fnordahl commented 3 days ago

Issue report

What version of MicroCeph are you using ?

$ snap info microceph
...
installed:          18.2.0+snapcba31e8c75                 (999) 87MB -

What are the steps to reproduce this issue ?

Using the following script to create LXD VMs:

set -x
n=0

for node in micro-1 micro-2 micro-3; do
    n=$(( n + 1 ))
    lxc stop microcloud:sunbeam-$n || true
    lxc delete microcloud:sunbeam-$n || true
    lxc storage volume delete microcloud:default sunbeam-$n-disk-2 || true
    lxc storage volume delete microcloud:default sunbeam-$n-disk-3 || true
    lxc storage volume delete microcloud:default sunbeam-$n-disk-4 || true

    lxc launch ubuntu:jammy microcloud:sunbeam-$n \
        --vm \
        -c limits.cpu=16 \
        -c limits.memory=32GiB \
        --target $node
    lxc storage volume create microcloud:default \
        sunbeam-$n-disk-2 size=100GiB --type=block --target=$node
    lxc storage volume create microcloud:default \
        sunbeam-$n-disk-3 size=100GiB --type=block --target=$node
    lxc storage volume create microcloud:default \
        sunbeam-$n-disk-4 size=100GiB --type=block --target=$node

    lxc config device add microcloud:sunbeam-$n eth1 nic network=internal
    lxc config device add microcloud:sunbeam-$n \
        sunbeam-$n-disk-2 disk pool=default source=sunbeam-$n-disk-2
    lxc config device add microcloud:sunbeam-$n \
        sunbeam-$n-disk-3 disk pool=default source=sunbeam-$n-disk-3
    lxc config device add microcloud:sunbeam-$n \
        sunbeam-$n-disk-4 disk pool=default source=sunbeam-$n-disk-4
done

What happens (observed behaviour) ?

$ lsblk
NAME     MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
loop0      7:0    0 63.9M  1 loop  /snap/core20/2318
loop1      7:1    0 38.8M  1 loop  /snap/snapd/21759
loop2      7:2    0   87M  1 loop  /snap/lxd/28373
sda        8:0    0  100G  0 disk  
├─sda1     8:1    0 99.9G  0 part  /
├─sda14    8:14   0    4M  0 part  
└─sda15    8:15   0  106M  0 part  /boot/efi
sdb        8:16   0  100G  0 disk  
└─mpatha 253:0    0  100G  0 mpath 
sdc        8:32   0  100G  0 disk  
└─mpatha 253:0    0  100G  0 mpath 
sdd        8:48   0  100G  0 disk  
└─mpatha 253:0    0  100G  0 mpath 

$ sudo microceph disk list

Available unpartitioned disks on this system:
+---------------+-----------+------+------------------------------------------------------------------+
|     MODEL     | CAPACITY  | TYPE |                               PATH                               |
+---------------+-----------+------+------------------------------------------------------------------+
| QEMU HARDDISK | 100.00GiB | scsi | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_sunbeam--1--disk--2 |
+---------------+-----------+------+------------------------------------------------------------------+
| QEMU HARDDISK | 100.00GiB | scsi | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_sunbeam--1--disk--3 |
+---------------+-----------+------+------------------------------------------------------------------+
| QEMU HARDDISK | 100.00GiB | scsi | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_sunbeam--1--disk--4 |
+---------------+-----------+------+------------------------------------------------------------------+

$ sudo microceph disk add /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_sunbeam--1--disk--2

+------------------------------------------------------------------+---------+
|                               PATH                               | STATUS  |
+------------------------------------------------------------------+---------+
| /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_sunbeam--1--disk--2 | Failure |
+------------------------------------------------------------------+---------+
Error: failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 1: exit status 250 (2024-07-02T09:34:17.422+0000 7f2e824d58c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input
2024-07-02T09:34:17.422+0000 7f2e824d58c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input
2024-07-02T09:34:17.422+0000 7f2e824d58c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input
2024-07-02T09:34:17.450+0000 7f2e824d58c0 -1 bdev(0x55735efae000 /var/lib/ceph/osd/ceph-1/block) open open got: (16) Device or resource busy
2024-07-02T09:34:17.450+0000 7f2e824d58c0 -1 bluestore(/var/lib/ceph/osd/ceph-1) mkfs failed, (16) Device or resource busy
2024-07-02T09:34:17.450+0000 7f2e824d58c0 -1 OSD::mkfs: ObjectStore::mkfs failed with error (16) Device or resource busy
2024-07-02T09:34:17.450+0000 7f2e824d58c0 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-1: (16) Device or resource busy)

What were you expecting to happen ?

Successful addition of disk.

Relevant logs, error output, etc.

See above.

Additional comments.

Disabling multipath with sudo multipath -F allows disks to be added.

sabaini commented 2 days ago

Hi @fnordahl I've seen this as well in lxd vms that automatically start multipathd.

I'm a bit hesitant on automatically resetting multipath sudo multipath -F -- if user had multipath devices they cared about that would impact them I believe?

fnordahl commented 2 days ago

Yes, i was not suggesting to unconditionally disable it, but we could either make it work (if that is even possible) or detect the situation and tell the user what to do?

sabaini commented 2 days ago

Ack, good point wrt. to detecting the situation :+1: we should have useful feedback in this situation