Open DarrylWong opened 6 months ago
cc @cockroachdb/test-eng
I looked into this some more and it looks like how we create vnets is to blame. The code correctly identifies that we cannot have overlapping address spaces or else peering won't work. But when choosing an address space, it only accounts for virtual networks that are actively being created in the current call to createVM
.
This means the following sequence will fail:
roachprod create cluster --clouds=azure --azure-locations=eastus,westus
a. This works:eastus
will have address prefix 1, westus
will have address prefix 2roachprod create cluster --clouds=azure --azure-locations=westus,canadacentral
a. This fails:westus
was already created with prefix 2,canadacentral
also gets prefix 2 because it's the second location. Peering fails.roachprod create cluster --clouds=azure --azure-locations=eastus,westus2
a. This fails:eastus
has prefix 1 from before, westus2
gets prefix 2, but peering fails because eastus
is already peered with westus
which also has prefix 2.roachprod create cluster --clouds=azure --azure-locations=eastus,westus2
a. This fails again for a different reason: The code determines if we need to create peerings based on if new VPCs were created. Since they weren't it won't even attempt to and fails when setting up SSH as it can't connect to the other regions.Another issue is that we are assigning the CIDR ranges to be too large at 65k vms per VPC. This only gives us 9 zones to work with before things overlap. I think a true fix would involve using terraform to predefine the networks, similar to how we have it set up for AWS. Was hoping this would be a quick win 😢 so probably not going to pick that up at this moment. I do have a potential bandaid fix in mind that I might give a shot.
I think a true fix would involve using terraform to predefine the networks, similar to how we have it set up for AWS.
Yep. That sounds like the best approach.
Was hoping this would be a quick win 😢 so probably not going to pick that up at this moment.
I'm reminded of the phrase, "Life is like a box of chocolates..." :)
Currently Azure does not support creating geo-distributed clusters. There is existing infra to declare zones, but it runs into issues when creating zones that have overlapping address spaces, i.e.
eastus
andwestus
, which we commonly do in multiregion tests.Trying to do so returns the following error:
The issue should be investigated and fixed so we can support Geo Zones in Azure and enable multiregion tests.
Jira issue: CRDB-38977