Closed nirs closed 6 days ago
$ basic-test/deploy -c configs/deployment-k8s-regional-rbd.yaml envs/regional-dr.yaml
2024-06-24 00:11:14,358 INFO [deploy] Deploying application
2024-06-24 00:11:14,358 INFO [deploy] Deploying application 'deployment-rbd'
2024-06-24 00:11:15,627 INFO [deploy] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2024-06-24 00:11:15,919 INFO [deploy] Application running on cluster 'dr1'
$ basic-test/enable-dr -c configs/deployment-k8s-regional-rbd.yaml envs/regional-dr.yaml
2024-06-24 00:11:26,165 INFO [enable-dr] Enable DR
2024-06-24 00:11:26,240 INFO [enable-dr] Disabling OCM scheduling for 'placement.cluster.open-cluster-management.io/placement'
2024-06-24 00:11:26,434 INFO [enable-dr] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2024-06-24 00:11:26,853 INFO [enable-dr] waiting for namespace deployment-rbd
2024-06-24 00:11:27,028 INFO [enable-dr] Waiting until 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' reports status
2024-06-24 00:11:27,759 INFO [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' Available condition
2024-06-24 00:11:27,997 INFO [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' PeerReady condition
2024-06-24 00:11:28,229 INFO [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc' first replication
2024-06-24 00:12:57,207 INFO [enable-dr] DR enabled
$ kubectl gather --contexts hub,dr1,dr2 -n deployment-rbd -d gather.after-enable-dr
2024-06-24T00:17:29.663+0300 INFO gather Using kubeconfig "/home/nsoffer/.kube/config"
2024-06-24T00:17:29.666+0300 INFO gather Gathering from namespaces ["deployment-rbd"]
2024-06-24T00:17:29.667+0300 INFO gather Using all addons
2024-06-24T00:17:29.667+0300 INFO gather Gathering from cluster "hub"
2024-06-24T00:17:29.667+0300 INFO gather Gathering from cluster "dr1"
2024-06-24T00:17:29.667+0300 INFO gather Gathering from cluster "dr2"
2024-06-24T00:17:29.682+0300 INFO gather Gathered 0 resources from cluster "dr2" in 0.015 seconds
2024-06-24T00:17:29.836+0300 INFO gather Gathered 18 resources from cluster "hub" in 0.169 seconds
2024-06-24T00:17:29.838+0300 INFO gather Gathered 23 resources from cluster "dr1" in 0.171 seconds
2024-06-24T00:17:29.838+0300 INFO gather Gathered 41 resources from 3 clusters in 0.171 seconds
$ basic-test/disable-dr -c configs/deployment-k8s-regional-rbd.yaml envs/regional-dr.yaml
2024-06-24 00:17:49,307 INFO [disable-dr] Disable DR
2024-06-24 00:17:49,385 INFO [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc'
2024-06-24 00:17:57,299 INFO [disable-dr] DR was disabled
$ kubectl gather --contexts hub,dr1,dr2 -n deployment-rbd -d gather.after-enable-dr
2024-06-24T00:17:29.663+0300 INFO gather Using kubeconfig "/home/nsoffer/.kube/config"
2024-06-24T00:17:29.666+0300 INFO gather Gathering from namespaces ["deployment-rbd"]
2024-06-24T00:17:29.667+0300 INFO gather Using all addons
2024-06-24T00:17:29.667+0300 INFO gather Gathering from cluster "hub"
2024-06-24T00:17:29.667+0300 INFO gather Gathering from cluster "dr1"
2024-06-24T00:17:29.667+0300 INFO gather Gathering from cluster "dr2"
2024-06-24T00:17:29.682+0300 INFO gather Gathered 0 resources from cluster "dr2" in 0.015 seconds
2024-06-24T00:17:29.836+0300 INFO gather Gathered 18 resources from cluster "hub" in 0.169 seconds
2024-06-24T00:17:29.838+0300 INFO gather Gathered 23 resources from cluster "dr1" in 0.171 seconds
2024-06-24T00:17:29.838+0300 INFO gather Gathered 41 resources from 3 clusters in 0.171 seconds
Comparing placement before/after:
$ diff -u gather.after-enable-dr/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml gather.after-disable-dr/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml
--- gather.after-enable-dr/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml 2024-06-24 00:17:29.804953284 +0300
+++ gather.after-disable-dr/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml 2024-06-24 00:18:11.481164101 +0300
@@ -8,9 +8,7 @@
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cluster.open-cluster-management.io/v1beta1","kind":"Placement","metadata":{"annotations":{},"labels":{"app":"deployment-rbd"},"name":"placement","namespace":"deployment-rbd"},"spec":{"clusterSets":["default"],"numberOfClusters":1}}
creationTimestamp: "2024-06-23T21:11:15Z"
- finalizers:
- - drpc.ramendr.openshift.io/finalizer
- generation: 2
+ generation: 3
labels:
app: deployment-rbd
managedFields:
@@ -59,25 +57,32 @@
f:annotations:
f:drplacementcontrol.ramendr.openshift.io/drpc-name: {}
f:drplacementcontrol.ramendr.openshift.io/drpc-namespace: {}
- f:finalizers:
- .: {}
- v:"drpc.ramendr.openshift.io/finalizer": {}
f:spec:
+ f:predicates: {}
f:prioritizerPolicy:
.: {}
f:mode: {}
f:spreadPolicy: {}
manager: manager
operation: Update
- time: "2024-06-23T21:11:26Z"
+ time: "2024-06-23T21:17:57Z"
name: placement
namespace: deployment-rbd
- resourceVersion: "3604"
+ resourceVersion: "4719"
uid: 07f54cce-1bcf-4a40-adfc-36d84d894b86
spec:
clusterSets:
- default
numberOfClusters: 1
+ predicates:
+ - requiredClusterSelector:
+ claimSelector: {}
+ labelSelector:
+ matchExpressions:
+ - key: name
+ operator: In
+ values:
+ - dr1
prioritizerPolicy:
mode: Additive
spreadPolicy: {}
New note when modifying placement predicates:
2024-06-23T21:17:57.237Z INFO controllers.DRPlacementControl util/placement.go:47 NOTE: modifying placement predicates to select current cluster {"DRPC": {"name":"deployment-rbd-drpc","namespace":"deployment-rbd"}, "rid": "e75aec3d-963a-4996-8717-bb804cb1c433", "namespace": "deployment-rbd", "placement": "placement", "cluster": "dr1"}
@nirs please update envtests for the changes as appropriate.
Testing relocate when predicates differ:
diff -ur 07-relocate/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml 08-disable-dr/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml
--- 07-relocate/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml 2024-06-24 23:42:37.108256298 +0300
+++ 08-disable-dr/hub/namespaces/deployment-rbd/cluster.open-cluster-management.io/placements/placement.yaml 2024-06-24 23:43:15.955457463 +0300
@@ -8,9 +8,7 @@
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cluster.open-cluster-management.io/v1beta1","kind":"Placement","metadata":{"annotations":{},"labels":{"app":"deployment-rbd"},"name":"placement","namespace":"deployment-rbd"},"spec":{"clusterSets":["default"],"numberOfClusters":1}}
creationTimestamp: "2024-06-24T20:24:35Z"
- finalizers:
- - drpc.ramendr.openshift.io/finalizer
- generation: 3
+ generation: 4
labels:
app: deployment-rbd
managedFields:
@@ -59,9 +57,6 @@
f:annotations:
f:drplacementcontrol.ramendr.openshift.io/drpc-name: {}
f:drplacementcontrol.ramendr.openshift.io/drpc-namespace: {}
- f:finalizers:
- .: {}
- v:"drpc.ramendr.openshift.io/finalizer": {}
f:spec:
f:predicates: {}
f:prioritizerPolicy:
@@ -70,10 +65,10 @@
f:spreadPolicy: {}
manager: manager
operation: Update
- time: "2024-06-24T20:35:35Z"
+ time: "2024-06-24T20:43:05Z"
name: placement
namespace: deployment-rbd
- resourceVersion: "29797"
+ resourceVersion: "31125"
uid: 30758fc5-384b-4bb9-a335-af0743242d04
spec:
clusterSets:
@@ -87,7 +82,7 @@
- key: name
operator: In
values:
- - dr1
+ - dr2
prioritizerPolicy:
mode: Additive
spreadPolicy: {}
Only in 07-relocate/hub/namespaces/deployment-rbd: ramendr.openshift.io
New logs:
2024-06-24T20:43:05.127Z INFO controllers.DRPlacementControl util/placement.go:51 NOTE: modifying placement predicates to select current cluster {"DRPC": {"name":"deployment-rbd-drpc","namespace":"deployment-rbd"}, "rid": "8a75b90c-0d40-4107-bcb2-54778a4ce1f7", "namespace": "deployment-rbd", "placement": "placement", "cluster": "dr2"}
application running on dr2, placement pointing to dr1
watch -n 1 -x kubectl get placementdecisions placement-decision-1 -o jsonpath='{.status.decisions}{"\n"}' -n deployment-rbd --context hub
$ kubectl delete drpc deployment-rbd-drpc -n deployment-rbd --context hub
drplacementcontrol.ramendr.openshift.io "deployment-rbd-drpc" deleted
2024-06-25T09:06:34.417Z INFO controllers.DRPlacementControl controllers/drplacementcontrol_controller.go:2078 Found ClusterDecision {"ClsDedicision": []}
2024-06-25T09:06:34.417Z INFO controllers.DRPlacementControl controllers/drplacementcontrol_controller.go:1953 Using DRPC preferredCluster, Relocate progression detected as switching to preferred cluster {"DRPC": {"name":"deployment-rbd-drpc","namespace":"deployment-rbd"}, "rid": "d3b0facf-aee0-42dc-a635-1835cbce2861"}
Placement was not modified since the current value matches the cluster name.
The final result is the that application is not running on any cluster. Not sure if this is the wanted result, but I don't see how we can avoid this. If we delay deletion of the drpc until relocated is completed, it can stuck forever without being able to delete the drpc.
To test manual recovery, I removed the scheduling-disable annotation from the placement, so see if the app will be created on cluster dr1 with the right data.
The app was started on clsuter dr1 with a new pvc:
$ kubectl exec pod/busybox-6bbf88b9f8-fkz8j -n deployment-rbd --context dr1 -- cat /var/log/ramen.log
Tue Jun 25 11:01:07 UTC 2024 START
Tue Jun 25 11:01:17 UTC 2024 UPDATE
Tue Jun 25 11:01:27 UTC 2024 UPDATE
Tue Jun 25 11:01:37 UTC 2024 UPDATE
Changing the placement to cluster dr2, the app was started on cluster dr2, but also using a new pvc:
$ kubectl exec pod/busybox-6bbf88b9f8-kcjn9 -n deployment-rbd --context dr2 -- cat /var/log/ramen.log
Tue Jun 25 11:07:22 UTC 2024 START
Tue Jun 25 11:07:32 UTC 2024 UPDATE
Tue Jun 25 11:07:42 UTC 2024 UPDATE
Tue Jun 25 11:07:52 UTC 2024 UPDATE
So when disabling dr in the middle of relocate we can lose the application data. I opened #1473 to track this issue.
We discussed this in the team meeting, and I think both ways are valid - we can keep disable DR single step (for integration with the UI) by never removing the annotation (or schedulerName for PlacementRule).
To disable DR you just delete the DRPC. You don't need to change the Placement[Rule] since OCM scheduling is always disabled.
If a user want to enable OCM scheduling they either do not care about the data, since OCM does not support stateful application (e.g, moving the the PVs to another cluster when changing the placement). So changing the placement is the user responsibility, not needed in the common case when we disable DR.
I'll add another PR implementing the simpler approach.
Posted simpler alternative based on @BenamarMk suggestion: #1474
Replaced by #1474
Instead of manual steps, ramen modifies the Plagement[Rule] to make it safe after disabling DR.
Changes:
Tested:
simplify-disable-dr.tar.gz
Not tested:
Related ocm-ramen-samples changes:
Fixes #1441