RamenDR / ramen

Apache License 2.0
70 stars 51 forks source link

Simplify disable DR flow for OCM managed workloads to single simple step #1441

Closed nirs closed 6 days ago

nirs commented 1 month ago

Currently disabling DR for OCM managed workloads require multiple complicated steps, documented here: https://github.com/RamenDR/ocm-ramen-samples?tab=readme-ov-file#disable-dr-for-a-dr-enabled-application

For OCM discovered workloads the process is already single simple step, since OCM does not manage the application: https://github.com/RamenDR/ocm-ramen-samples/blob/6bc74a53255b2a183ad0d5bcd38b3e77cd6bc343/README.md#disable-dr-for-a-dr-enabled-ocm-discovered-application

Old flow

  1. Ensure the placement is pointing to the cluster where the workload is currently placed

    This is critical to avoid data loss if OCM moves the application to another cluster.

    This is not trivial change since the selector can be either a claimSelector and labelSelector and both can contain one or more expressions that allow the work load to run on one or more clusters.

    Assuming the workload is running on cluster1 when disabling DR, the user need to add this selector:

    spec:
     predicates:
     - requiredClusterSelector:
         claimSelector: {}
         labelSelector:
           matchExpressions:
           - key: name
             operator: In
             values:
             - cluster1
  2. Delete the drpc resource for the OCM application on the hub

  3. Wait util the deletion completes

  4. Enable OCM scheduling by deleting the annotation

    cluster.open-cluster-management.io/experimental-scheduling-disable: "true"

New flow

We want to make this a single step for the user:

  1. Delete the drpc resource

Ramen will modify the placement as needed automatically.

This will make it easy to disable DR manually or to create implement this in a application managing ramen.

Detailed steps

Setting requiredClusterSelector

The selector can have multiple values:

Ramen cannot analyze all possible combinations to ensure they are safe and result in the workload pinned to the current cluster. We will replace the current selector with a new selector using labelSelector with single matchExpressions selecting the current cluster by name (name In [clustername]).

Consider issuing a warning if we replace a complex requiredClusterSelector with a simpler one.

The cluster.open-cluster-management.io/experimental-scheduling-disable annotation

We will keep the annotation on the placement when disabling DR. This is the safest solution since the user may have a gitops controlled placement, and any change we make to the placement can be overridden by the placement in git.

If the placement requiredClusterSelector is overridden and we removed the annotation, OCM may move the workload to another cluster, and the current data protected by ramen will be lost.

Number of clusters

OCM supports running application on multiple clusters concurrently, which we cannot support for DR. We will change the value to 1 when disabling DR.

PlacementRule

Similar change should be done for PlacementRule.

Change clusterReplicas to 1

Chagne clusterSelector to:

spec:
  clusterSelector:
    matchLabels:
      name: cluster1

Keep schedularName: ramen for the same reason we keep the placement annotation.

Documentation change

Add a node to the documentation that the user need to modify the placement manually if they want to change the workload placement.

Add a warning about losing data if a workload placement is changed to another cluster.