RamenDR / ramen

Apache License 2.0
73 stars 56 forks source link

Use conditions instead of status.phase in basic-test #1016

Closed nirs closed 1 year ago

nirs commented 1 year ago

The scripts use DRPC status.phase for waiting before or after an operation. This makes the code less flexible since we have to deal with 3 phases - Deployed, FailedOver, and Relocated, which are all the same - the system is in stable state.

See also https://maelvls.dev/kubernetes-conditions/#are-conditions-still-used

Related-to: #978

Shwetha-Acharya commented 1 year ago

I will take this up

nirs commented 1 year ago

I did experiments when trying to debug failover issues - we cannot use conditions yet to wait for relocate, since PeerReady is set to True right after relocate, and switch to False only later when EnsuringVolumesAreSecondary progression.

Here is example flow:

$ kubectl get drpc -n busybox-sample --context hub -o wide -w
NAME           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME   DURATION   PEER READY
busybox-drpc   6m50s   dr1                dr2               Relocate       FailedOver     Completed                   2023-09-04T19:07:09Z   3m11.173740287s   True
busybox-drpc   6m50s   dr1                dr2               Relocate       Initiating     PreparingFinalSync          2023-09-04T19:12:10Z                     True
busybox-drpc   7m      dr1                dr2               Relocate       Relocating     RunningFinalSync            2023-09-04T19:12:10Z                     True
busybox-drpc   7m30s   dr1                dr2               Relocate       Relocating     EnsuringVolumesAreSecondary   2023-09-04T19:12:10Z                     False
busybox-drpc   8m      dr1                dr2               Relocate       Relocating     WaitingForResourceRestore     2023-09-04T19:12:10Z                     False
busybox-drpc   8m      dr1                dr2               Relocate       Relocating     WaitingForResourceRestore     2023-09-04T19:12:10Z                     False
busybox-drpc   8m30s   dr1                dr2               Relocate       Relocating     WaitingForResourceRestore     2023-09-04T19:12:10Z                     False
busybox-drpc   8m30s   dr1                dr2               Relocate       Relocated      UpdatedPlacement              2023-09-04T19:12:10Z                     False
busybox-drpc   8m30s   dr1                dr2               Relocate       Relocated      Cleaning Up                   2023-09-04T19:12:10Z                     False
busybox-drpc   9m      dr1                dr2               Relocate       Relocated      Completed                     2023-09-04T19:12:10Z   2m10.257905102s   True
busybox-drpc   9m30s   dr1                dr2               Relocate       Relocated      Completed                     2023-09-04T19:12:10Z   2m10.257905102s   True
busybox-drpc   10m     dr1                dr2               Relocate       Relocated      Completed                     2023-09-04T19:12:10Z   2m10.257905102s   True

@Shwetha-Acharya did you have time to work on this? If not i have a working draft using what we can use with current code.

Shwetha-Acharya commented 1 year ago

I did experiments when trying to debug failover issues - we cannot use conditions yet to wait for relocate, since PeerReady is set to True right after relocate, and switch to False only later when EnsuringVolumesAreSecondary progression.

Here is example flow:

$ kubectl get drpc -n busybox-sample --context hub -o wide -w
NAME           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME   DURATION   PEER READY
busybox-drpc   6m50s   dr1                dr2               Relocate       FailedOver     Completed                   2023-09-04T19:07:09Z   3m11.173740287s   True
busybox-drpc   6m50s   dr1                dr2               Relocate       Initiating     PreparingFinalSync          2023-09-04T19:12:10Z                     True
busybox-drpc   7m      dr1                dr2               Relocate       Relocating     RunningFinalSync            2023-09-04T19:12:10Z                     True
busybox-drpc   7m30s   dr1                dr2               Relocate       Relocating     EnsuringVolumesAreSecondary   2023-09-04T19:12:10Z                     False
busybox-drpc   8m      dr1                dr2               Relocate       Relocating     WaitingForResourceRestore     2023-09-04T19:12:10Z                     False
busybox-drpc   8m      dr1                dr2               Relocate       Relocating     WaitingForResourceRestore     2023-09-04T19:12:10Z                     False
busybox-drpc   8m30s   dr1                dr2               Relocate       Relocating     WaitingForResourceRestore     2023-09-04T19:12:10Z                     False
busybox-drpc   8m30s   dr1                dr2               Relocate       Relocated      UpdatedPlacement              2023-09-04T19:12:10Z                     False
busybox-drpc   8m30s   dr1                dr2               Relocate       Relocated      Cleaning Up                   2023-09-04T19:12:10Z                     False
busybox-drpc   9m      dr1                dr2               Relocate       Relocated      Completed                     2023-09-04T19:12:10Z   2m10.257905102s   True
busybox-drpc   9m30s   dr1                dr2               Relocate       Relocated      Completed                     2023-09-04T19:12:10Z   2m10.257905102s   True
busybox-drpc   10m     dr1                dr2               Relocate       Relocated      Completed                     2023-09-04T19:12:10Z   2m10.257905102s   True

@Shwetha-Acharya did you have time to work on this? If not i have a working draft using what we can use with current code.

Hi @nirs Though I went through the problem, I had not got the time to work on it. Please upload your work!

nirs commented 1 year ago

@Shwetha-Acharya ok #1056 updates basic test to use conditions with additional check for Relocated phase until ramen is fixed to clear the conditions when starting relocate or provide another condition.