Open BenamarMk opened 1 month ago
@ShyamsundarR unit test and e2e complete. Ready for review and merge by Monday if possible
@BenamarMk let's discuss the changes. Overall we should rely on the S3 version of VRG when MCV version is not found, but I am not sure we can make this the default in all/most cases.
One thing to note; we rely on s3 only when there is no primary.
Fixing a day-one issue post-Hub Recovery process was recently uncovered due to the recent change where we added a new validation check in Ramen. This validation was designed to validate the failover cluster before initiating the failover procedure.
Before the introduction of this validation, the sync and failover operations functioned correctly post-hub recovery despite this underlying bug. The reason is that the
ReplicationDestination
existed on the failover cluster. The missingRDSpec
had no effect.I would like to point out that this issue happens only when the primary cluster is inaccessible. Ramen cannot retrieve the VRG from the primary cluster in these situations. As a result, when Ramen regenerates the
ManifestWork
for the VRG, theRDSpecs
are excluded. TheseRDSpecs
are created for eachProtectedPVC
object in the primary VRG.The solution in this PR involves adding a check for the existence of the
ManifestWork
before creating it, to prevent accidentally overwriting a validVRG.spec
. If aManifestWork
already exists on the destination cluster, its creation is skipped. Additionally, a utility function has been created that returns a map of VRGs retrieved using MCV. If a primary VRG is not found, the function retrieves it from theS3
store and adds it to the map. This new function replaces all occurrences ofd.vrgs
throughout the codebase, ensuring thatDRPC
maintains an accurate view of allVRGs
in the managed clusters.https://bugzilla.redhat.com/show_bug.cgi?id=2284021
TODO: