Azure / azure-sdk-for-go

This repository is for active development of the Azure SDK for Go. For consumers of the SDK we recommend visiting our public developer docs at:
https://docs.microsoft.com/azure/developer/go/
MIT License
1.57k stars 804 forks source link

Azure Compute API is not properly validating which VM types can deployed in an Availability Zone #5968

Closed justaugustus closed 4 years ago

justaugustus commented 4 years ago

Bug Report

I'm going to preface this with, "I have no clue where the best place to file this is, so apologies if it's in the wrong tracker".

What happened?

This was originally filed in https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/294. Cluster API Azure, as of filing this, does the following:

The code for that is here:

The reconciler finds an AZ to use and when the request is submitted, we get the following:

I1007 22:52:02.452034       1 azuremachine_reconciler.go:216] Selecting first available AZ as no availability zone was set or user-provided availability zone is not supported for VM size Standard_B2ms in location southcentralus
I1007 22:52:02.452089       1 azuremachine_reconciler.go:220] Selected availability zone 2 for cluster-1007-b-controlplane-0
E1007 22:52:03.042859       1 controller.go:218] controller-runtime/controller "msg"="Reconciler error" "error"="failed to create AzureMachine VM: failed to create vm cluster-1007-b-controlplane-0 : failed to create or get machine: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"ResourceTypeNotSupportAvailabilityZones\" Message=\"The resource type 'virtualMachines' does not support availability zones at location 'southcentralus' and api-version '2019-03-01'.\""  "controller"="azuremachine" "request"={"Namespace":"default","Name":"cluster-1007-b-controlplane-0"}

What did you expect or want to happen?

We should be able to successfully create virtual machines with or without AZs. It seems whatever logic on the Azure side should correctly validate a virtual machine template and allow creation in the specified AZ.

How can we reproduce it?

To dig a little deeper, I wanted to see if the VM type wasn't actually supported in southcentralus:

az vm list-skus -l southcentralus --zone true --size Standard_B2ms -o table
ResourceType     Locations       Name           Zones    Capabilities                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Restrictions
---------------  --------------  -------------  -------  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  --------------
virtualMachines  southcentralus  Standard_B2ms  1,2      ['MaxResourceVolumeMB=16384', 'OSVhdSizeMB=1047552', 'vCPUs=2', 'HyperVGenerations=V1,V2', 'MemoryGB=8', 'MaxDataDiskCount=4', 'LowPriorityCapable=False', 'PremiumIO=True', 'vCPUsAvailable=2', 'vCPUsPerCore=1', 'CombinedTempDiskAndCachedIOPS=2400', 'CombinedTempDiskAndCachedReadBytesPerSecond=23592960', 'CombinedTempDiskAndCachedWriteBytesPerSecond=23592960', 'UncachedDiskIOPS=1920', 'UncachedDiskBytesPerSecond=23592960', 'EphemeralOSDiskSupported=False', 'AcceleratedNetworkingEnabled=False', 'RdmaEnabled=False', 'MaxNetworkInterfaces=3']  None

So it seems both AZ 1 and 2 are available in southcentralus.

Now with a location I normally test in, eastus:

$ az vm list-skus -l eastus --zone true --size Standard_B2ms -o table
ResourceType     Locations    Name           Zones    Capabilities                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Restrictions
---------------  -----------  -------------  -------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  --------------
virtualMachines  eastus       Standard_B2ms  1,2,3    ['MaxResourceVolumeMB=16384', 'OSVhdSizeMB=1047552', 'vCPUs=2', 'HyperVGenerations=V1,V2', 'MemoryGB=8', 'MaxDataDiskCount=4', 'LowPriorityCapable=False', 'PremiumIO=True', 'vCPUsAvailable=2', 'vCPUsPerCore=1', 'CombinedTempDiskAndCachedIOPS=2400', 'CombinedTempDiskAndCachedReadBytesPerSecond=23592960', 'CombinedTempDiskAndCachedWriteBytesPerSecond=23592960', 'UncachedDiskIOPS=1920', 'UncachedDiskBytesPerSecond=23592960', 'EphemeralOSDiskSupported=False']  None

Again, multiple zones are available for selection.

Now if I try this from the Azure portal, I get the following:

{"code":"InvalidTemplateDeployment","message":"The template deployment failed with error: 'The resource with id: '/subscriptions/<subscription-id>/resourceGroups/capz-augustus/providers/Microsoft.Compute/virtualMachines/DFSADFASDF' failed validation with message: 'The resource type 'virtualMachines' does not support availability zones at location 'southcentralus' and api-version '2019-08-01'.'.'."}

Screenshot from 2019-10-07 19-57-27

If I try to deploy a machine from the portal in southcentralus without AZs, template validation is successful:

Screenshot from 2019-10-07 21-35-29

I've also validated within Cluster API Azure that if set the machine templates to disable AZs, I'm able to successfully create a VM.

Anything we should know about your environment.

This doesn't appear to be a environment-specific issue. It's presenting across multiple API versions and deployment methods.

The bug was initially discovered by @cecilerobertmichon and I believe we're also using different subscriptions.

cc: @juan-lee @devigned @ritazh @craiglpeters

devigned commented 4 years ago

I just tried to build a VM in southcentralus and UI popped a link to docs for regions and services with zones support. southcentralus wasn't listed amongst the regions. I'm thinking the API is sending back lies.

@jhendrixMSFT, I think this is a compute team issue.

@justaugustus, great detail in the issue. Thank you for opening this. I'm sure it's confused many folks.

ArcturusZhang commented 4 years ago

Actually similar issue here if I did not get your point wrong.

ghost commented 4 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @mjconnection, @Drewm3

ghost commented 4 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @mjconnection, @Drewm3

Drewm3 commented 4 years ago

I will check with the team to determine why the AZs are showing up in the SKUs API even though AZs are not currently enabled in South Central US.

justaugustus commented 4 years ago

@Drewm3 -- sounds good! Just as a note, it could very well be multiple locations that are returning incorrect results. I just stopped w/ southcentralus once I realized it was a Compute API thing and not a downstream Cluster API Azure thing.

Drewm3 commented 4 years ago

After some additional research I have found out that for a VM to be available for a zonal deployment it must be available in the VM SKUs API and the resource must have a zone mapping in the Get Providers API (https://docs.microsoft.com/en-us/rest/api/resources/providers/get).

In the case of SouthCentralUS VMs are not available in any zones which is why you are seeing the failure. The behavior that you are seeing in SouthCentralUS is the expected behavior when AZs are being rolled out for a region, but they are not yet available publicly.

My recommendation is to leverage the Provider API to determine if AZs can be used instead of falling back to hard-coding regions which support AZs.

Please let me know if there are any other questions. Otherwise I will resolve this issue.

Note, here is example output from the Get Providers API:

{
  "id": "/subscriptions/<snip>/providers/Microsoft.Compute",
  "namespace": "Microsoft.Compute",
   . . .
    {
      "resourceType": "virtualMachines",
      "locations": [
        "East US",
         <snip>
        "South Central US",
        <snip>
      ],
      <snip>
      "zoneMappings": [
        {
          <snip>
          "location": "East US",
          "zones": [
            "1",
            "3",
            "2"
          ]
        },
        <snip>
        {
          "location": "South Central US",
          "zones": []
        }
      ],
      "capabilities": "CrossResourceGroupResourceMove, CrossSubscriptionResourceMove, SystemAssignedResourceIdentity, SupportsTags, SupportsLocation"
    },
justaugustus commented 4 years ago

@Drewm3 -- I've been staring at this for a bit and I'm not seeing zoneMappings exposed in the resources API (for Azure SDK for Go):

https://github.com/Azure/azure-sdk-for-go/blob/8c372e6e1243e0427dffb2d750099ee3ed161a44/services/resources/mgmt/2019-05-01/resources/models.go#L1775-L1789

Here's some sample code:

```go package main import ( "context" "fmt" "os" "github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2019-05-01/resources" "github.com/Azure/go-autorest/autorest" "github.com/Azure/go-autorest/autorest/azure/auth" "github.com/pkg/errors" ) //var _ azure.Service = (*Service)(nil) // Service provides operations on Providers type Service struct { Client resources.ProvidersClient //Scope *scope.ClusterScope } func main() { authorizer, err := auth.NewAuthorizerFromEnvironment() if err != nil { errors.Wrapf(err, "failed to create azure session") } subscriptionID := os.Getenv("AZURE_SUBSCRIPTION_ID") if subscriptionID == "" { fmt.Errorf("error creating azure services. Environment variable AZURE_SUBSCRIPTION_ID is not set") } client := getProvidersClient(subscriptionID, authorizer) provider, err := client.Get(context.Background(), "Microsoft.Compute", "") if err != nil { errors.Wrap(err, "provider error") } if provider.ResourceTypes != nil { res := *provider.ResourceTypes var vm resources.ProviderResourceType for _, rtype := range res { if *rtype.ResourceType == "virtualMachines" { vm = rtype break } } fmt.Printf("%+v\n", *vm.ResourceType) fmt.Printf("%+v\n", *vm.Locations) fmt.Printf("%+v\n", *vm.APIVersions) fmt.Printf("%+v\n", *vm.Capabilities) fmt.Printf("%+v\n", vm.Properties) if vm.Aliases != nil { aliases := *vm.Aliases for _, alias := range aliases { fmt.Printf("%+v\n", *alias.Name) } } } } // getProvidersClient creates a new Providers client from subscriptionid. func getProvidersClient(subscriptionID string, authorizer autorest.Authorizer) resources.ProvidersClient { providersClient := resources.NewProvidersClient(subscriptionID) providersClient.Authorizer = authorizer providersClient.AddToUserAgent("cluster-api-azure-services") return providersClient } ```

...and the output:

virtualMachines
[East US East US 2 West US Central US North Central US South Central US North Europe West Europe East Asia Southeast Asia Japan East Japan West Australia East Australia Southeast Australia Central Brazil South South India Central India West India Canada Central Canada East West US 2 West Central US UK South UK West Korea Central Korea South France Central South Africa North UAE North]
[2019-07-01 2019-03-01 2018-10-01 2018-06-01 2018-04-01 2017-12-01 2017-03-30 2016-08-30 2016-04-30-preview 2016-03-30 2015-06-15 2015-05-01-preview]
CrossResourceGroupResourceMove, CrossSubscriptionResourceMove, SystemAssignedResourceIdentity, SupportsTags, SupportsLocation
map[]

Any insights?

devigned commented 4 years ago

Looks like the zoneMappings keys is not available in the latest specification, thus is not generated in each of the languages: https://github.com/Azure/azure-rest-api-specs/blob/a5edf3b757d2c06091a5389b053a8cfa562525a0/specification/resources/resource-manager/Microsoft.Resources/stable/2019-08-01/resources.json#L4246-L4285

@Drewm3, I can see it clear enough via the REST response from Providers GET. It's just not defined in the spec...

Drewm3 commented 4 years ago

Thanks for the update. I will work with the owners of this API to ensure zoneMappings are added to the spec so they show up in the SDKs.

justaugustus commented 4 years ago

Thanks for working on this, Drew!

Drewm3 commented 4 years ago

This issue has been resolved at the resource manager layer by suppressing zonal details in regions which are not ready for zones yet. For example South Central US is no longer showing any sizes available for deployment to Availability Zones.

ghost commented 4 years ago

Thanks for working with Microsoft on GitHub! Tell us how you feel about your experience using the reactions on this comment.