Steps to Reproduce

Deploy and configure Metro DR clusters (OCP for ACM Hub, 2 OCF/ODF and a shared external Ceph).
Deploy busybox-sample[1] as DR app
Enable fencing of the active cluster via DRCluster CR
Perform failover of the app to the 2nd cluster via DRPlacementControl CR

[1] https://github.com/RamenDR/ocm-ramen-samples/tree/main/busybox-odr-metro

Actual results

When I enabled the fencing, I checked what is the app running on the primary cluster doing, and I see that it's still running, but the volume was moved to read only state and the app is no longer writing data as expected:

$ oc rsh -n busybox-sample busybox /bin/busybox sh
/ # ls -l /mnt/test/
total 492
drwx------    2 root     root         16384 Aug  4 12:03 lost+found
-rw-r--r--    1 root     root        484932 Aug  4 16:58 outfile
/ # date
Thu Aug  4 17:03:30 UTC 2022
/ # tail /mnt/test/outfile 
Thu Aug 4 16:57:57 UTC 2022
Thu Aug 4 16:57:58 UTC 2022
Thu Aug 4 16:57:59 UTC 2022
Thu Aug 4 16:58:00 UTC 2022
Thu Aug 4 16:58:01 UTC 2022
Thu Aug 4 16:58:02 UTC 2022
Thu Aug 4 16:58:03 UTC 2022
Thu Aug 4 16:58:04 UTC 2022
Thu Aug 4 16:58:05 UTC 2022
Thu Aug 4 16:58:06 UTC 2022
/ # touch /mnt/test/qe
touch: /mnt/test/qe: Read-only file system

Then after the failover, I checked the app pod again, this time on the 2nd cluster, and I see the app is running fine again:

$ oc rsh -n busybox-sample busybox /bin/busybox sh 
/ # tail /mnt/test/outfile 
Thu Aug 4 17:16:20 UTC 2022
Thu Aug 4 17:16:21 UTC 2022
Thu Aug 4 17:16:22 UTC 2022
Thu Aug 4 17:16:23 UTC 2022
Thu Aug 4 17:16:24 UTC 2022
Thu Aug 4 17:16:25 UTC 2022
Thu Aug 4 17:16:26 UTC 2022
Thu Aug 4 17:16:27 UTC 2022
Thu Aug 4 17:16:28 UTC 2022
Thu Aug 4 17:16:29 UTC 2022

But there is a gap in the data file, the last line is missing (there is no 'Thu Aug 4 16:58:06 UTC 2022' which was the last line in the file before failover):

Thu Aug 4 16:58:02 UTC 2022
Thu Aug 4 16:58:03 UTC 2022
Thu Aug 4 16:58:04 UTC 2022
Thu Aug 4 16:58:05 UTC 2022
Thu Aug 4 17:13:32 UTC 2022
Thu Aug 4 17:13:33 UTC 2022
Thu Aug 4 17:13:34 UTC 2022
Thu Aug 4 17:13:35 UTC 2022

Expected results

All data written by the app when running on the primary cluster is available on the secondary location.

nirs commented 1 year ago

Hey @mbukatov this is expected. Failing over will drop data written since the last replication. If you want to move the application to another cluster without losing any data, you need to use the "Relocate" action instead of the "Failover" action.

nirs commented 1 year ago

Closing since the behavior is expected. Feel free to reopen if needed.

RamenDR / ocm-ramen-samples

Possible small data loss during failover of DR app #19

Steps to Reproduce

Actual results

Expected results