LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
954 stars 76 forks source link

Resources get stuck in SyncTarget when a node comes back online #251

Closed chiraganand-e2e closed 2 years ago

chiraganand-e2e commented 3 years ago

Setup:

There was a failure on one of the linstor satellite nodes in our cluster so it had to be rebuilt from scratch (PV -> VG -> thinpool). When we brought the node back online Linstor realised that it has to create all resources on this node because there were none and the resource group had place-count as 3. All the resources were created (40 of them) but 5 got stuck at SyncTarget 97% or 98%. This information was given to us by linstor resource list but we looked at drbdadm status on the node itself these resources just showed as Inconsistent without any SyncTarget information.

After restart, the satellite Linstor controller showed these resources as Inconsistent.

Eventually, we had to remove these 5 resources from this node and create them again.

I feel there are two bugs here:

  1. Linstor should have up-to-date status of the state of DRBD resources.
  2. During a resync all the resources should get synced automatically to 100% and not get stuck.
raltnoeder commented 3 years ago

Those are probably both problems in DRBD rather than LINSTOR itself. One appears to be DRBD becoming unresponsive during the resync, the other one is probably a consequence of that problem, causing DRBD not to send out events for changes of the resources' status.