Closed nirs closed 1 year ago
I will take this up
I did experiments when trying to debug failover issues - we cannot use conditions yet to wait for relocate, since PeerReady is set to True right after relocate, and switch to False only later when EnsuringVolumesAreSecondary
progression.
Here is example flow:
$ kubectl get drpc -n busybox-sample --context hub -o wide -w
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
busybox-drpc 6m50s dr1 dr2 Relocate FailedOver Completed 2023-09-04T19:07:09Z 3m11.173740287s True
busybox-drpc 6m50s dr1 dr2 Relocate Initiating PreparingFinalSync 2023-09-04T19:12:10Z True
busybox-drpc 7m dr1 dr2 Relocate Relocating RunningFinalSync 2023-09-04T19:12:10Z True
busybox-drpc 7m30s dr1 dr2 Relocate Relocating EnsuringVolumesAreSecondary 2023-09-04T19:12:10Z False
busybox-drpc 8m dr1 dr2 Relocate Relocating WaitingForResourceRestore 2023-09-04T19:12:10Z False
busybox-drpc 8m dr1 dr2 Relocate Relocating WaitingForResourceRestore 2023-09-04T19:12:10Z False
busybox-drpc 8m30s dr1 dr2 Relocate Relocating WaitingForResourceRestore 2023-09-04T19:12:10Z False
busybox-drpc 8m30s dr1 dr2 Relocate Relocated UpdatedPlacement 2023-09-04T19:12:10Z False
busybox-drpc 8m30s dr1 dr2 Relocate Relocated Cleaning Up 2023-09-04T19:12:10Z False
busybox-drpc 9m dr1 dr2 Relocate Relocated Completed 2023-09-04T19:12:10Z 2m10.257905102s True
busybox-drpc 9m30s dr1 dr2 Relocate Relocated Completed 2023-09-04T19:12:10Z 2m10.257905102s True
busybox-drpc 10m dr1 dr2 Relocate Relocated Completed 2023-09-04T19:12:10Z 2m10.257905102s True
@Shwetha-Acharya did you have time to work on this? If not i have a working draft using what we can use with current code.
I did experiments when trying to debug failover issues - we cannot use conditions yet to wait for relocate, since PeerReady is set to True right after relocate, and switch to False only later when
EnsuringVolumesAreSecondary
progression.Here is example flow:
$ kubectl get drpc -n busybox-sample --context hub -o wide -w NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-drpc 6m50s dr1 dr2 Relocate FailedOver Completed 2023-09-04T19:07:09Z 3m11.173740287s True busybox-drpc 6m50s dr1 dr2 Relocate Initiating PreparingFinalSync 2023-09-04T19:12:10Z True busybox-drpc 7m dr1 dr2 Relocate Relocating RunningFinalSync 2023-09-04T19:12:10Z True busybox-drpc 7m30s dr1 dr2 Relocate Relocating EnsuringVolumesAreSecondary 2023-09-04T19:12:10Z False busybox-drpc 8m dr1 dr2 Relocate Relocating WaitingForResourceRestore 2023-09-04T19:12:10Z False busybox-drpc 8m dr1 dr2 Relocate Relocating WaitingForResourceRestore 2023-09-04T19:12:10Z False busybox-drpc 8m30s dr1 dr2 Relocate Relocating WaitingForResourceRestore 2023-09-04T19:12:10Z False busybox-drpc 8m30s dr1 dr2 Relocate Relocated UpdatedPlacement 2023-09-04T19:12:10Z False busybox-drpc 8m30s dr1 dr2 Relocate Relocated Cleaning Up 2023-09-04T19:12:10Z False busybox-drpc 9m dr1 dr2 Relocate Relocated Completed 2023-09-04T19:12:10Z 2m10.257905102s True busybox-drpc 9m30s dr1 dr2 Relocate Relocated Completed 2023-09-04T19:12:10Z 2m10.257905102s True busybox-drpc 10m dr1 dr2 Relocate Relocated Completed 2023-09-04T19:12:10Z 2m10.257905102s True
@Shwetha-Acharya did you have time to work on this? If not i have a working draft using what we can use with current code.
Hi @nirs Though I went through the problem, I had not got the time to work on it. Please upload your work!
@Shwetha-Acharya ok #1056 updates basic test to use conditions with additional check for Relocated
phase until ramen is fixed to clear the conditions when starting relocate
or provide another condition.
The scripts use DRPC status.phase for waiting before or after an operation. This makes the code less flexible since we have to deal with 3 phases - Deployed, FailedOver, and Relocated, which are all the same - the system is in stable state.
See also https://maelvls.dev/kubernetes-conditions/#are-conditions-still-used
Related-to: #978