ceph / ceph-helm

Curated applications for Kubernetes
Apache License 2.0
108 stars 36 forks source link

can't use md0 as device #67

Open mamoit opened 6 years ago

mamoit commented 6 years ago

Is this a request for help?: no

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Version of Helm and Kubernetes: helm

Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

kubectl

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.5-gke.4", GitCommit:"6265b9797fc8680c8395abeab12c1e3bad14069a", GitTreeState:"clean", BuildDate:"2018-08-04T03:47:40Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Which chart: ceph-helm

What happened: Tried to setup an mdadm device to stripe 2 disks in raid0 and handle them as a single osd. It does not finish the osd-device setup properly.

What you expected to happen: for the setup to finish and work as well as it does with any other sdb/sdc/sdd...

How to reproduce it (as minimally and precisely as possible): create an md0 device and use it as you would any other sdX device (OSD device). the setup fails because the osd-activate-pod crashes with:

2018-08-13T17:14:53.260443418Z command_check_call: Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 9 --monmap /var/lib/ceph/tmp/mnt.eawl9p/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.eawl9p --osd-journal /var/lib/ceph/tmp/mnt.eawl9p/journal --osd-uuid 36356a07-a91f-4625-8b6f-864dd991de5f --setuser ceph --setgroup disk
2018-08-13T17:14:53.324461741Z 2018-08-13 17:14:53.324174 7fc7564cee00 -1 filestore(/var/lib/ceph/tmp/mnt.eawl9p) mkjournal(1066): error creating journal on /var/lib/ceph/tmp/mnt.eawl9p/journal: (2) No such file or directory
2018-08-13T17:14:53.324483608Z 2018-08-13 17:14:53.324256 7fc7564cee00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-08-13T17:14:53.324849587Z 2018-08-13 17:14:53.324610 7fc7564cee00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.eawl9p: (2) No such file or directory
2018-08-13T17:14:53.329122347Z mount_activate: Failed to activate
2018-08-13T17:14:53.329225389Z unmount: Unmounting /var/lib/ceph/tmp/mnt.eawl9p
2018-08-13T17:14:53.3294495Z command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.eawl9p
2018-08-13T17:14:53.375884854Z Traceback (most recent call last):
2018-08-13T17:14:53.375907887Z   File "/usr/sbin/ceph-disk", line 9, in <module>
2018-08-13T17:14:53.375913364Z     load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2018-08-13T17:14:53.375918173Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5717, in run
2018-08-13T17:14:53.377208587Z     main(sys.argv[1:])
2018-08-13T17:14:53.377222215Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5668, in main
2018-08-13T17:14:53.37842527Z     args.func(args)
2018-08-13T17:14:53.378439165Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3758, in main_activate
2018-08-13T17:14:53.379145782Z     reactivate=args.reactivate,
2018-08-13T17:14:53.379156768Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3521, in mount_activate
2018-08-13T17:14:53.379899211Z     (osd_id, cluster) = activate(path, activate_key_template, init)
2018-08-13T17:14:53.379910301Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3698, in activate
2018-08-13T17:14:53.380577968Z     keyring=keyring,
2018-08-13T17:14:53.380589482Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3165, in mkfs
2018-08-13T17:14:53.381196441Z     '--setgroup', get_ceph_group(),
2018-08-13T17:14:53.381206848Z   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 566, in command_check_call
2018-08-13T17:14:53.381212315Z     return subprocess.check_call(arguments)
2018-08-13T17:14:53.381216601Z   File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2018-08-13T17:14:53.381482189Z     raise CalledProcessError(retcode, cmd)
2018-08-13T17:14:53.381659138Z subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'9', '--monmap', '/var/lib/ceph/tmp/mnt.eawl9p/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.eawl9p', '--osd-journal', '/var/lib/ceph/tmp/mnt.eawl9p/journal', '--osd-uuid', u'36356a07-a91f-4625-8b6f-864dd991de5f', '--setuser', 'ceph', '--setgroup', 'disk']' returned non-zero exit status 1

Anything else we need to know: I found out a place where it is assumed that the partition number X of a device is defined by just adding a number to the device name, this is true for sdX1 for example, but not for mdXp1 I applied the following patch, but still doesn't work.

diff --git a/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl b/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl
index eda2b3f..88cf800 100644
--- a/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl
+++ b/ceph/ceph/templates/bin/_osd_disk_prepare.sh.tpl
@@ -27,7 +27,7 @@ function osd_disk_prepare {
     log "Checking if it belongs to this cluster"
     tmp_osd_mount="/var/lib/ceph/tmp/`echo $RANDOM`/"
     mkdir -p $tmp_osd_mount
-    mount ${OSD_DEVICE}1 ${tmp_osd_mount}
+    mount $(dev_part ${OSD_DEVICE} 1) ${tmp_osd_mount}
     osd_cluster_fsid=`cat ${tmp_osd_mount}/ceph_fsid`
     umount ${tmp_osd_mount} && rmdir ${tmp_osd_mount}
     cluster_fsid=`ceph ${CLI_OPTS} --name client.bootstrap-osd --keyring $OSD_BOOTSTRAP_KEYRING fsid`
@@ -56,7 +56,7 @@ function osd_disk_prepare {
     echo "Unmounting LOCKBOX directory"
     # NOTE(leseb): adding || true so when this bug will be fixed the entrypoint will not fail
     # Ceph bug tracker: http://tracker.ceph.com/issues/18944
-    DATA_UUID=$(blkid -o value -s PARTUUID ${OSD_DEVICE}1)
+    DATA_UUID=$(blkid -o value -s PARTUUID $(dev_part ${OSD_DEVICE} 1))
     umount /var/lib/ceph/osd-lockbox/${DATA_UUID} || true
   else
     ceph-disk -v prepare ${CLI_OPTS} --journal-uuid ${OSD_JOURNAL_UUID} ${OSD_DEVICE} ${OSD_JOURNAL}