Testing Microceph on a three node cluster. Removing a node (to simulate a failure) and rebuilding it, it's not possible to rejoin the cluster. There's no way to remove the OSDs from the failed node as this attempts to contact the node that failed (using microceph disk remove). Without being able to remove the OSDs, it's not possible to remove the failed node from the cluster (using microceph cluster remove).
What version of MicroCeph are you using ?
18.2.0+snap71f71782c5
What are the steps to reproduce this issue ?
Install Microceph on three nodes
Remove one of the nodes to simulate a node failing.
Unable to remove the failed node from Microceph since removing OSDs tries to contact the failed node.
What happens (observed behaviour) ?
Unable to rejoin the node since Microceph thinks the node already exists.
…
Issue report
Testing Microceph on a three node cluster. Removing a node (to simulate a failure) and rebuilding it, it's not possible to rejoin the cluster. There's no way to remove the OSDs from the failed node as this attempts to contact the node that failed (using microceph disk remove). Without being able to remove the OSDs, it's not possible to remove the failed node from the cluster (using microceph cluster remove).
What version of MicroCeph are you using ?
18.2.0+snap71f71782c5
What are the steps to reproduce this issue ?
What happens (observed behaviour) ?
Unable to rejoin the node since Microceph thinks the node already exists. …
What were you expecting to happen ?
…
Relevant logs, error output, etc.
If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.
Additional comments.
…