Open BenamarMk opened 3 years ago
Hello, I just came across this :). So, according to what I noticed during my tests, you need to read the two S3 buckets contents before failing over. In a scenario where the "from" cluster is no longer available, we will not succeed to failover since we need to read the s3 bucket content in the "from" cluster. As a workaround, I thought about using buckets provisioned outside the two clusters. In this way, I managed to failover when the first cluster is no longer available. But, maybe, getting rid of S3 and replace it with a replicated block storage will succeed if we don't need any longer to read the content of the "from" cluster bucket during failover. This is my point of view, please correct me if I'm wrong, and I would like to know also what is your final decision about this since this issue is 1 year old.
During a failover, we only need to read from one s3 store in order to restore.
On Thu, Jul 21, 2022 at 8:08 AM VphDreamer @.***> wrote:
Hello, I just came across this :). So, according to what I noticed during my tests, you need to read the two S3 buckets contents before failing over. In a scenario where the "from" cluster is no longer available, we will not succeed to failover since we need to read the s3 bucket content in the "from" cluster. As a workaround, I thought about using buckets provisioned outside the two clusters. In this way, I managed to failover when the first cluster is no longer available. But, maybe, getting rid of S3 and replace it with a replicated block storage will succeed if we don't need any longer to read the content of the "from" cluster bucket during failover. This is my point of view, please correct me if I'm wrong, and I would like to know also what is your final decision about this since this issue is 1 year old.
— Reply to this email directly, view it on GitHub https://github.com/RamenDR/ramen/issues/56#issuecomment-1191405046, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJED3EFEMPRVZSRCB7RTXUDVVE4TJANCNFSM44XYYKPA . You are receiving this because you authored the thread.Message ID: @.***>
Today, the Ramen DRManager backs up PV metadata to S3 object store once the PV is bound to a PVC that is under DR protection. That PV metadata is then restored in the peer cluster on failover.
The S3 object store dependency might come with an added cost and complexity. It is possible, however, to completely avoid using S3 object store by leveraging the same underlying storage used for provisioning PVs for applications. In other words, change Ramen DRManager to be a Stateful Operator. Ramen DRManager enables DR Protection for its own PVC the same way it enables DR protection for applications PVCs.
How is it done? This still needs brainstorming sessions, but a potential solution would be for the user (or Ramen DROrchestrator) to create a replicated static PV for each Ramen DRManager instance. For example, assume we have DRClusterPeers{us-east, us-west}. The user creates PV1 read-write mode in the us-east as primary, and secondary in us-west where it is used for read-only. It also creates PV2 read-write in us-west as primary, and secondary in us-east where it is used for read-only. And during failover, the Ramen DRManager (coordinated by the Ramen DROrchestrator) in the failover cluster, reads the PVs metadata from its claimed volume and applies the K8s PV resources before the Ramen DROrchestrator allows the application to failover.