cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.22k stars 3.82k forks source link

roachprod: support Geo Zones for Azure. #124612

Open DarrylWong opened 6 months ago

DarrylWong commented 6 months ago

Currently Azure does not support creating geo-distributed clusters. There is existing infra to declare zones, but it runs into issues when creating zones that have overlapping address spaces, i.e. eastus and westus, which we commonly do in multiregion tests.

Trying to do so returns the following error:

Code="VnetAddressSpaceOverlapsWithAlreadyPeeredVnet" Message="Cannot create or update peering /subscriptions/73075e4f-150d-4f4f-956b-053507b61fa7/resourceGroups/roachprod-vnets-eastus/providers/Microsoft.Network/virtualNetworks/roachprod-vnets-eastus/virtualNetworkPeerings/roachprod-vnets-eastus-roachprod-vnets-westus3. Virtual networks /subscriptions/73075e4f-150d-4f4f-956b-053507b61fa7/resourceGroups/roachprod-vnets-westus3/providers/Microsoft.Network/virtualNetworks/roachprod-vnets-westus3 and /subscriptions/73075e4f-150d-4f4f-956b-053507b61fa7/resourceGroups/roachprod-vnets-eastus/providers/Microsoft.Network/virtualNetworks/roachprod-vnets-eastus cannot be peered because address space of the first virtual network overlaps with address space of virtual network /subscriptions/73075e4f-150d-4f4f-956b-053507b61fa7/resourceGroups/roachprod-vnets-westus2/providers/Microsoft.Network/virtualNetworks/roachprod-vnets-westus2 already peered with the second virtual network. Overlapping address prefixes: 10.2.0.0/16. For more information please refer to https://aka.ms/VnetPeeringInfo" Details=[])

The issue should be investigated and fixed so we can support Geo Zones in Azure and enable multiregion tests.

Jira issue: CRDB-38977

blathers-crl[bot] commented 6 months ago

cc @cockroachdb/test-eng

DarrylWong commented 2 months ago

I looked into this some more and it looks like how we create vnets is to blame. The code correctly identifies that we cannot have overlapping address spaces or else peering won't work. But when choosing an address space, it only accounts for virtual networks that are actively being created in the current call to createVM.

This means the following sequence will fail:

  1. roachprod create cluster --clouds=azure --azure-locations=eastus,westus a. This works:eastus will have address prefix 1, westus will have address prefix 2
  2. roachprod create cluster --clouds=azure --azure-locations=westus,canadacentral a. This fails:westus was already created with prefix 2,canadacentral also gets prefix 2 because it's the second location. Peering fails.
  3. roachprod create cluster --clouds=azure --azure-locations=eastus,westus2 a. This fails:eastus has prefix 1 from before, westus2 gets prefix 2, but peering fails because eastus is already peered with westus which also has prefix 2.
  4. roachprod create cluster --clouds=azure --azure-locations=eastus,westus2 a. This fails again for a different reason: The code determines if we need to create peerings based on if new VPCs were created. Since they weren't it won't even attempt to and fails when setting up SSH as it can't connect to the other regions.

Another issue is that we are assigning the CIDR ranges to be too large at 65k vms per VPC. This only gives us 9 zones to work with before things overlap. I think a true fix would involve using terraform to predefine the networks, similar to how we have it set up for AWS. Was hoping this would be a quick win 😢 so probably not going to pick that up at this moment. I do have a potential bandaid fix in mind that I might give a shot.

srosenberg commented 2 months ago

I think a true fix would involve using terraform to predefine the networks, similar to how we have it set up for AWS.

Yep. That sounds like the best approach.

Was hoping this would be a quick win 😢 so probably not going to pick that up at this moment.

I'm reminded of the phrase, "Life is like a box of chocolates..." :)