Closed hunter closed 8 years ago
what do you think of a script that does zap
and ceph-disk prepare
repeated on the devices until they are properly prepared? we do this before running the ceph-osd container. Or would it make sense to add it to the container's script to zap and retry if prepare failed?
The latter feels a bit like a nasty hack to work around a bug.
Could try a small script that runs and tests the prep though. Perhaps it could be run as an init-container
before the OSD starts? (that way we could get access to the config?)
Would be interested to see if 10.2.3 fixes this...
This is a peculiar behaviour really. On ubuntu, it never fails, but once ceph-disk is ran in a container (CoreOS), that's when we get the issue. Yeah, maybe 10.2.3 might have a fix.
An interesting experiment... instead of using the latest "Ubuntu 14.04" image could try Fedora or Ubuntu 16.04 images (ideally we'd use the same one across all containers... just to avoid any weirdness)
@darkcrux if you get a chance, can you post the logs from when a format fails?
oh I lost the logs. but from what I've read, sometimes one of the partitions (the data partition) doesn't get created.
Will get logs again when I get the chance.
@dexter, shall we try turning the formatting into an init-container
that won't launch the pod until init is complete.
how do we do it? my initial thought was having the formatting done by a k8s job.
The problem with a k8s job is that it won't stop the OSD container from launching when the container is half formatted. Adding an init-container (running a small bash script) to the OSD pod which loops through formatting, zapping on any error until the drive is formatted should handle launch order better
yeah. my initial thought was to format everything with a job. just read about init-containers. are they in 1.4?
We actually part of 1.3 :)
I think they graduated from alpha to beta between 1.3 and 1.4
oh. cool. init-container it is then. :)
This is ready. just need one final testing (the ceph-daemon container one)
When launching an OSD container the
ceph-disk prepare
is failing (randomly it seems). This requires that the OSD is stopped, the disk zapped and then prepared outside of the container