ceph / ceph-helm

Curated applications for Kubernetes
Apache License 2.0
108 stars 36 forks source link

ceph osd fails due to /dev/sdX being changed across reboots. #76

Open spatel-cog opened 5 years ago

spatel-cog commented 5 years ago

Is this a request for help?:No


Is this a BUG REPORT or FEATURE REQUEST? (choose one):BUG REPORT

Version of Helm and Kubernetes:Helm: 2.11.0, Kubernetes 1.11.6

Which chart: ceph-helm

What happened: Servers were configured with SAS controllers and onboard ATA controller i.e two sets of SSD/HDD controllers. Across reboots the drives /dev/ names changed e.g. drive on SAS controller port 1 became /dev/sdc and prior to reboot it was /dev/sda. This is not uncommon. The values.yaml file was configured to avoid the situation using by-path rather than /dev/sdX values.

osd_devices:

What you expected to happen: _osd_disk_activate.sh.tpl, _osd_disk_prepare.sh.tpl should have found the correct device name using readlink and used the corresponding /dev/sdX device.

How to reproduce it (as minimally and precisely as possible):

A SAS controller is not necessary - given 3 drives, /dev/sda, /dev/sdb, /dev/sdc, install ceph on /dev/sda and /dev/sdc. Shutdown the server and remove /dev/sdb. On restart, osd1 or the osd attached to /dev/sdc will fail.

Anything else we need to know: I'm attaching the "fixes" I made to support by-path names in the values.yaml file:

_osd_disk_prepare.sh.tpl.txt _osd_disk_activate.sh.tpl.txt