RamenDR / ramen

Apache License 2.0
73 stars 54 forks source link

Support automated failover for StatefulSets (or workloads that do not delete their PVCs) #558

Open ShyamsundarR opened 2 years ago

ShyamsundarR commented 2 years ago

StatefulSets(STS) by default do not delete their PVCs. This is being fixed in k8s upstream via this KEP and would potentially graduate to beta in an upcoming release. (https://github.com/kubernetes/kubernetes/pull/111300).

This issue is to deal with the following problems till the above KEP is beta/GA.

Problem definition:

Proposed solution:

1) For a failover use-case, we will ignore the PVC in deletion state as a check

2) For a relocate use-case, we will still ensure PVC is deleted to prevent any PVC use-after-check-and-demote of the PVC

Other fixes and gotchas:

ShyamsundarR commented 2 years ago

Further notes post discussions with @BenamarMk

Currently for volsync we have the following PVC rules:

For volrep the PVC rules are:

For failover cases, the PVC can be deleted in either volsync or volrep cases, as there is no final sync operation that is required.

We would need to have a common set of rules across volrep and volsync, such that the actual replication mechanisim is abstracted away from the DRPC user. Hence we should either mandate operator/STS created PVCs are to be deleted or not for both.

With volsync if the operator/STS created PVCs are deleted, there is no possible way to perform a final sync. Hence the rule that we would adopt would be,

With the above, the STS auto-deletion of PVCs on STS deletion is not required, and we need to handle relocate for operator/STS created PVCs that are not deleted in the volrep case.

As a first step allowing failover, as mentioned in this issue, is a way forward, while retaining the same PVC rules as before.

Subsequently, changing the PVC rules for volrep as well to lign with volsync would be needed.

The one caveat in the case for relocate with volsync or volrep, if the PVC is not deleted would be as follows:

ShyamsundarR commented 2 years ago

NOTE: We need to evaluate this with Placement and ApplicationSets instead of Subscriptions that are in use at present.

ShyamsundarR commented 1 year ago

The one caveat in the case for relocate with volsync or volrep, if the PVC is not deleted would be as follows:

* A PVC is checked to ensure no pods are referencing it

* A PVC is checked to ensure no VolumeAttachments are present

* But post these checks if there is a PVC consumer, the relocate may sync stale data

  * From a VR perspective, one option is to move the volume to secondary during relocate and letting the storage plugin ensure there are no consumers for the volume as it is demoted
  * The other is to ensure that all workloads as reported on the OCM hub are stopped before issuing the move to secondary for VRG (or final sync for volsync)

We could technically change the PVC to read only, to prevent further data modifications in this case. (requires experimentation)

ShyamsundarR commented 1 year ago

The one caveat in the case for relocate with volsync or volrep, if the PVC is not deleted would be as follows:

* A PVC is checked to ensure no pods are referencing it

* A PVC is checked to ensure no VolumeAttachments are present

* But post these checks if there is a PVC consumer, the relocate may sync stale data

  * From a VR perspective, one option is to move the volume to secondary during relocate and letting the storage plugin ensure there are no consumers for the volume as it is demoted
  * The other is to ensure that all workloads as reported on the OCM hub are stopped before issuing the move to secondary for VRG (or final sync for volsync)

We could technically change the PVC to read only, to prevent further data modifications in this case. (requires experimentation)

The above is disallowed: The PersistentVolumeClaim "busybox-pvc" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims

ShyamsundarR commented 1 year ago

With volsync if the operator/STS created PVCs are deleted, there is no possible way to perform a final sync.

An alternative to overcome this issue would be as follows:

The above would help address a few other items as well: