We are looking for steps to upgrade from Single AZ to Multi-AZ deployment on AWS. The options considered and the challenges are listed below
Option 1: Deploy a new statefulset which spans across different AZ and decommission Bookie one by one from single AZ statefulset.
Assumption : Autorecovery replicates data so that when Bookies are decommissioned one by one there is no data loss.
Questions :
Number of EKS worker nodes in the cluster is 3. Considering anti affinity rule is set in Bookie ( which means only one Bookie can run on one node), we assume that the cluster needs to scales out to include 3 more nodes.
How to ensure the worker nodes on which the older Bookie statefulset runs is not moved to a different zone. Wouldn’t the autoscaling groups created on EKS distribute nodes equally across zones ? If the number of worker nodes in the auto scale group is set as 3 * total AZ (for example say in 3 AZ region set the count as 9) then probably the initial nodes of the cluster may stay in the same AZ.
How to ensure the replication is complete before shutting down the second/third bookie?
Option 2: Migrate from single AZ to Multi AZ with downtime using EBS snapshots
Can Pulsar deployment/statefulset be scaled down and EBS snapshotting be used to backup and restore data to EBS volumes in different AZ ?
Describe the bug
We are looking for steps to upgrade from Single AZ to Multi-AZ deployment on AWS. The options considered and the challenges are listed below
Option 1: Deploy a new statefulset which spans across different AZ and decommission Bookie one by one from single AZ statefulset.
Assumption : Autorecovery replicates data so that when Bookies are decommissioned one by one there is no data loss.
Questions :
Option 2: Migrate from single AZ to Multi AZ with downtime using EBS snapshots
Can Pulsar deployment/statefulset be scaled down and EBS snapshotting be used to backup and restore data to EBS volumes in different AZ ?