What happened:
Tried to setup an mdadm device to stripe 2 disks in raid0 and handle them as a single osd.
It does not finish the osd-device setup properly.
What you expected to happen:
for the setup to finish and work as well as it does with any other sdb/sdc/sdd...
How to reproduce it (as minimally and precisely as possible):
create an md0 device and use it as you would any other sdX device (OSD device).
the setup fails because the osd-activate-pod crashes with:
2018-08-13T17:14:53.260443418Z command_check_call: Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 9 --monmap /var/lib/ceph/tmp/mnt.eawl9p/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.eawl9p --osd-journal /var/lib/ceph/tmp/mnt.eawl9p/journal --osd-uuid 36356a07-a91f-4625-8b6f-864dd991de5f --setuser ceph --setgroup disk
2018-08-13T17:14:53.324461741Z 2018-08-13 17:14:53.324174 7fc7564cee00 -1 filestore(/var/lib/ceph/tmp/mnt.eawl9p) mkjournal(1066): error creating journal on /var/lib/ceph/tmp/mnt.eawl9p/journal: (2) No such file or directory
2018-08-13T17:14:53.324483608Z 2018-08-13 17:14:53.324256 7fc7564cee00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-08-13T17:14:53.324849587Z 2018-08-13 17:14:53.324610 7fc7564cee00 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.eawl9p: (2) No such file or directory
2018-08-13T17:14:53.329122347Z mount_activate: Failed to activate
2018-08-13T17:14:53.329225389Z unmount: Unmounting /var/lib/ceph/tmp/mnt.eawl9p
2018-08-13T17:14:53.3294495Z command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.eawl9p
2018-08-13T17:14:53.375884854Z Traceback (most recent call last):
2018-08-13T17:14:53.375907887Z File "/usr/sbin/ceph-disk", line 9, in <module>
2018-08-13T17:14:53.375913364Z load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2018-08-13T17:14:53.375918173Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5717, in run
2018-08-13T17:14:53.377208587Z main(sys.argv[1:])
2018-08-13T17:14:53.377222215Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5668, in main
2018-08-13T17:14:53.37842527Z args.func(args)
2018-08-13T17:14:53.378439165Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3758, in main_activate
2018-08-13T17:14:53.379145782Z reactivate=args.reactivate,
2018-08-13T17:14:53.379156768Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3521, in mount_activate
2018-08-13T17:14:53.379899211Z (osd_id, cluster) = activate(path, activate_key_template, init)
2018-08-13T17:14:53.379910301Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3698, in activate
2018-08-13T17:14:53.380577968Z keyring=keyring,
2018-08-13T17:14:53.380589482Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3165, in mkfs
2018-08-13T17:14:53.381196441Z '--setgroup', get_ceph_group(),
2018-08-13T17:14:53.381206848Z File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 566, in command_check_call
2018-08-13T17:14:53.381212315Z return subprocess.check_call(arguments)
2018-08-13T17:14:53.381216601Z File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2018-08-13T17:14:53.381482189Z raise CalledProcessError(retcode, cmd)
2018-08-13T17:14:53.381659138Z subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'9', '--monmap', '/var/lib/ceph/tmp/mnt.eawl9p/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.eawl9p', '--osd-journal', '/var/lib/ceph/tmp/mnt.eawl9p/journal', '--osd-uuid', u'36356a07-a91f-4625-8b6f-864dd991de5f', '--setuser', 'ceph', '--setgroup', 'disk']' returned non-zero exit status 1
Anything else we need to know:
I found out a place where it is assumed that the partition number X of a device is defined by just adding a number to the device name, this is true for sdX1 for example, but not for mdXp1
I applied the following patch, but still doesn't work.
diff --git a/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl b/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl
index eda2b3f..88cf800 100644
--- a/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl
+++ b/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl
@@ -27,7 +27,7 @@ function osd_disk_prepare {
log "Checking if it belongs to this cluster"
tmp_osd_mount="/var/lib/ceph/tmp/`echo $RANDOM`/"
mkdir -p $tmp_osd_mount
- mount ${OSD_DEVICE}1 ${tmp_osd_mount}
+ mount $(dev_part ${OSD_DEVICE} 1) ${tmp_osd_mount}
osd_cluster_fsid=`cat ${tmp_osd_mount}/ceph_fsid`
umount ${tmp_osd_mount} && rmdir ${tmp_osd_mount}
cluster_fsid=`ceph ${CLI_OPTS} --name client.bootstrap-osd --keyring $OSD_BOOTSTRAP_KEYRING fsid`
@@ -56,7 +56,7 @@ function osd_disk_prepare {
echo "Unmounting LOCKBOX directory"
# NOTE(leseb): adding || true so when this bug will be fixed the entrypoint will not fail
# Ceph bug tracker: http://tracker.ceph.com/issues/18944
- DATA_UUID=$(blkid -o value -s PARTUUID ${OSD_DEVICE}1)
+ DATA_UUID=$(blkid -o value -s PARTUUID $(dev_part ${OSD_DEVICE} 1))
umount /var/lib/ceph/osd-lockbox/${DATA_UUID} || true
else
ceph-disk -v prepare ${CLI_OPTS} --journal-uuid ${OSD_JOURNAL_UUID} ${OSD_DEVICE} ${OSD_JOURNAL}
Is this a request for help?: no
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Version of Helm and Kubernetes: helm
kubectl
Which chart: ceph-helm
What happened: Tried to setup an mdadm device to stripe 2 disks in raid0 and handle them as a single osd. It does not finish the
osd-device
setup properly.What you expected to happen: for the setup to finish and work as well as it does with any other sdb/sdc/sdd...
How to reproduce it (as minimally and precisely as possible): create an md0 device and use it as you would any other sdX device (OSD device). the setup fails because the
osd-activate-pod
crashes with:Anything else we need to know: I found out a place where it is assumed that the partition number X of a device is defined by just adding a number to the device name, this is true for sdX1 for example, but not for mdXp1 I applied the following patch, but still doesn't work.