apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.16k stars 3.57k forks source link

Request for steps to upgrade Pulsar from Single AZ to Multi-AZ deployment on AWS #11717

Open manjupriya-ar opened 3 years ago

manjupriya-ar commented 3 years ago

Describe the bug

We are looking for steps to upgrade from Single AZ to Multi-AZ deployment on AWS. The options considered and the challenges are listed below

Option 1: Deploy a new statefulset which spans across different AZ and decommission Bookie one by one from single AZ statefulset.

Assumption : Autorecovery replicates data so that when Bookies are decommissioned one by one there is no data loss.

Questions :

  1. Number of EKS worker nodes in the cluster is 3. Considering anti affinity rule is set in Bookie ( which means only one Bookie can run on one node), we assume that the cluster needs to scales out to include 3 more nodes.
  2. How to ensure the worker nodes on which the older Bookie statefulset runs is not moved to a different zone. Wouldn’t the autoscaling groups created on EKS distribute nodes equally across zones ? If the number of worker nodes in the auto scale group is set as 3 * total AZ (for example say in 3 AZ region set the count as 9) then probably the initial nodes of the cluster may stay in the same AZ.
  3. How to ensure the replication is complete before shutting down the second/third bookie?

Option 2: Migrate from single AZ to Multi AZ with downtime using EBS snapshots

Can Pulsar deployment/statefulset be scaled down and EBS snapshotting be used to backup and restore data to EBS volumes in different AZ ?

codelipenghui commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.