Open jseldess opened 4 years ago
@taroface and @johnrk for triage and prioritization.
Also cc @joshimhoff: How do we handle CC multi-region cluster upgrades?
@bdarnell, any opinions?
I would also tend to go region-by-region, although I don't think it makes a lot of difference and I'm mainly basing this on the fact that orchestration tooling is more likely to facilitate region-by-region upgrades instead of spreading it out more evenly.
For option 1, for example, @dbist mentioned that a customer he's working with has no application traffic in a region while upgrading that region.
If you can drain all traffic from a region while upgrading, that's probably a good idea. But if you're using geo-partitioning, that won't really be an option since that region will still need to serve traffic for ranges that are pinned there.
If you can drain all traffic from a region while upgrading, that's probably a good idea. But if you're using geo-partitioning, that won't really be an option since that region will still need to serve traffic for ranges that are pinned there.
that's a good point, this particular customer does not use geo-partitioning because some of their clusters are on core version.
I would also tend to go region-by-region, although I don't think it makes a lot of difference and I'm mainly basing this on the fact that orchestration tooling is more likely to facilitate region-by-region upgrades instead of spreading it out more evenly.
We go one node at a time, starting with region 1, then onto region 2, etc.
If you can drain all traffic from a region while upgrading, that's probably a good idea. But if you're using geo-partitioning, that won't really be an option since that region will still need to serve traffic for ranges that are pinned there.
We don't do this on CC. We upgrade one node at a time; other nodes in the region keep serving.
Relates to #5780.
linville (mdlinville) commented: It sounds like the recommendation here is not to drain traffic in the region, but to go per-region and within a region, go per-node. If it’s working for CC, it seems like a safe recommendation. Is that correct? If so, I can get this recommendation into the docs.
linville (mdlinville) commented: Bram Gruneir Coming back to this to see if the situation is still the same as in the description, whether it has been validated, etc? Any pointers?
Jesse Seldess commented:
In Slack, @holtrdan asked for the best practice when upgrading a multi-region cluster. Which is best?
@BramGruneir suggested option 1. We should validate and add a note to our upgrade docs, e.g, here.
We should also define how application traffic is a part of this. For option 1, for example, @dbist mentioned that a customer he's working with has no application traffic in a region while upgrading that region.
Jira Issue: DOC-591
Epic DOC-11047