canonical / microceph

Ceph for a one-rack cluster and appliances
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
193 stars 25 forks source link

microceph daemons stop before ceph consumer daemons stop during host shutdown #341

Open tregubovav-dev opened 2 months ago

tregubovav-dev commented 2 months ago

microceph daemons stop before ceph consumer daemons stop

All microceph daemons including osds and monitors stop before ceph consumers like LXD or Incus daemons during host shutdown, In some situation this behavior causes data loss and/or abnormal system behavior. For example graceful cluster shutdown during power outage.

What version of MicroCeph are you using ?

What are the steps to reproduce this issue ?

  1. Deploy 3 host nodes (VM or physical) with Ubuntu 22.04 or 23.20 server and update all packages; attach one dedicated disk to every node which will be used for ceph storage
  2. install microceph snap using latest/stable package
  3. switch LXD snap to latest/stable channel
  4. configure microceph cluster and join all nodes to the cluster; add disk to the cluster
  5. configure LXD cluster and join all nodes to it. configure ceph storage for LXD cluster
  6. restart all nodes at the same time and watch on shutdown logs output on screen. Yo may see that microceph daemons stop before lxd daemons (see screenshot below) image; However, this does not impact on shutdown process while none of instances is deployed to ceph storage and run
  7. deploy and launch any instances to LXD using ceph storage
  8. restart all nodes at the same time

What happens (observed behaviour) ?

LXD can't communicate to ceph as all monitors and osds in cluster are shutdown already, but LXD instances are running but they lost ceph storage already

What were you expecting to happen ?

LXD and other ceph consumers must be stopped before microceph services going to stop during host shutdown.

Relevant logs, error output, etc.

If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.

Additional comments.