Azure / azure-rest-api-specs

The source for REST API specifications for Microsoft Azure.
MIT License
2.52k stars 4.95k forks source link

`[Network/VPNGateway]` deployment operation failed due to an intermittent error #21164

Open wuxu92 opened 1 year ago

wuxu92 commented 1 year ago

Brief

VPN gateway deployment keeps failing with helpless information

Description

When creating a virtual network gateway for VPN Type with route-based policy, the create operation returns with server-side error without debugging/helpful information to investigate. I have tried with different SKU types and modified the customeRoute prefix value, which makes no different and all failed.

related API:

https://github.com/Azure/azure-rest-api-specs/blob/ae227e20bcb22d83634452026f701d75bec2619e/specification/network/resource-manager/Microsoft.Network/stable/2021-08-01/virtualNetworkGateway.json#L1937

example

gateway request payload

{
    "location": "eastus",
    "properties": {
        "activeActive": false,
        "customRoutes": {
            "addressPrefixes": [
                "101.168.0.6/32"
            ]
        },
        "enableBgp": false,
        "enablePrivateIpAddress": true,
        "gatewayType": "Vpn",
        "ipConfigurations": [
            {
                "name": "vnetGatewayConfig",
                "properties": {
                    "privateIPAllocationMethod": "Dynamic",
                    "publicIPAddress": {
                        "id": "/subscriptions/xxx-xxxx/resourceGroups/xxx-rg/providers/Microsoft.Network/publicIPAddresses/vpnpubip001"
                    },
                    "subnet": {
                        "id": "/subscriptions/xxx-xxxx/resourceGroups/xxx-rg/providers/Microsoft.Network/virtualNetworks/vpngw001/subnets/GatewaySubnet"
                    }
                }
            }
        ],
        "sku": {
            "name": "VpnGw3AZ",
            "tier": "VpnGw3AZ"
        },
        "vpnType": "RouteBased"
    }
}

reponse error message:

{
    "error": {
        "code": "VmssGatewayDeploymentFailed",
        "details": [],
        "message": "The gateway deployment operation failed due to an intermittent error. Please try again."
    },
    "status": "Failed"
}
ghost commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @vpngwsuppgithub.

Issue Details
## Brief VPN gateway deployment keeps failing with helpless information ## Description When creating a virtual network gateway for VPN Type with route-based policy, the create operation returns with server-side error without debugging/helpful information to investigate. I have tried with different SKU types and modified the customeRoute prefix value, which makes no different and all failed. ## related API: https://github.com/Azure/azure-rest-api-specs/blob/ae227e20bcb22d83634452026f701d75bec2619e/specification/network/resource-manager/Microsoft.Network/stable/2021-08-01/virtualNetworkGateway.json#L1937 ## example gateway request payload ```json { "location": "eastus", "properties": { "activeActive": false, "customRoutes": { "addressPrefixes": [ "101.168.0.6/32" ] }, "enableBgp": false, "enablePrivateIpAddress": true, "gatewayType": "Vpn", "ipConfigurations": [ { "name": "vnetGatewayConfig", "properties": { "privateIPAllocationMethod": "Dynamic", "publicIPAddress": { "id": "/subscriptions/xxx-xxxx/resourceGroups/xxx-rg/providers/Microsoft.Network/publicIPAddresses/vpnpubip001" }, "subnet": { "id": "/subscriptions/xxx-xxxx/resourceGroups/xxx-rg/providers/Microsoft.Network/virtualNetworks/vpngw001/subnets/GatewaySubnet" } } } ], "sku": { "name": "VpnGw3AZ", "tier": "VpnGw3AZ" }, "vpnType": "RouteBased" } } ``` reponse error message: ```json { "error": { "code": "VmssGatewayDeploymentFailed", "details": [], "message": "The gateway deployment operation failed due to an intermittent error. Please try again." }, "status": "Failed" } ```
Author: wuxu92
Assignees: -
Labels: `Network - VPN Gateway`, `Service Attention`, `needs-triage`
Milestone: -
mayank-reynencourt commented 1 year ago

Hi,

i'm also trying to create VNG and face same issue , any update on this ?

el-memer commented 1 year ago

Hi, I'm also facing same issue here (Region West Europe), I've opened a support ticket and will send updates here if I have some.

birdnathan commented 1 year ago

Same here - West Europe. Please fix!

FletchAD commented 1 year ago

Same here - UKSouth and UKWest, tried with VpnGw1 & VpnGw1AZ, interestingly the basic SKU seems to work ok, but that's not what I need.

slaffka-vlasov commented 1 year ago

The same is for uswest3 with VpnGw1 & VpnGw2. Basic doesn't work as well

FrankMormino commented 1 year ago

Same thing here - for west Europe as well.

FrankMormino commented 1 year ago

Hi, I'm also facing same issue here (Region West Europe), I've opened a support ticket and will send updates here if I have some.

Same here - trying the same in West Europe - any luck on the support ticket side?

ghost commented 1 year ago

Same issue here. Happening in West Europe trying to deploy a VpnGw2AZ. Tried deployment via Terraform and the Portal.

BevanSin commented 1 year ago

Note there is an issue currently with Azure Key Vault that may be causing this problem - if you look in the Service Health page there is an alert there that is affecting downstream Azure services like VPN Gateway.

birdnathan commented 1 year ago

From Azure support: “Hi Nathan

Thank you for contacting Microsoft Azure Networking support.

Please know that we are currently investigating an ongoing global issue regarding Virtual Network Gateway (VPN Gateway) deployments and we will get back to you with updates once this has been mitigated.”

Seems the key vault issue is wiping out network services. The status website need to reflect this and not just list key vault as impacted

haciz commented 1 year ago

Hello,

Indeed joining the list to report an issue, neither the empty PUT request is helping

Set-AzVirtualNetworkGateway: Long running operation failed with status 'Failed'. Additional Info:'The gateway deployment operation failed due to an intermittent error. Please try again.' StatusCode: 200 ReasonPhrase: OK Status: Failed ErrorCode: VmssGatewayDeploymentFailed ErrorMessage: The gateway deployment operation failed due to an intermittent error. Please try again.

imkevinjones commented 1 year ago

Same issue here... This has been happening all day.

shanyuen commented 1 year ago

Yesterday I bought a subscription and created a fresh virtual network in southeast asia. But still error.

Status: Conflict {"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"VmssGatewayDeploymentFailed","message":"The gateway deployment operation failed due to an intermittent error. Please try again."}]} SKU: VpnGw1 Correlation id: 2f5880ba-a8b2-4184-801e-a5b8cea6f3b4

FletchAD commented 1 year ago

After this confirmation at 3rd Nov @ 05:52 UTC https://status.azure.com/en-us/status/history/ I was able to provision VpnGw1 in UKSouth this morning.

ghost commented 1 year ago

Seems to be mitigated. Can also deploy the VGW now.

el-memer commented 1 year ago

Got an answer from the support team.

I apologize for the inconvenience that was caused but as we checked, there is a service disruption in all regions, which impacts the deployments of the VPN Gateway resources. We hope that it might be resolved by the end of the day. Our internal team are actively investigating and working on a solution, however, currently we do not have a specific resolution date or time.

EDIT: My deployment succeeded this morning in Region West Europe for a VpnGw1 Route based !

I'll let you know if I have some updates from support, but indeed this seems to be mitigated.

haciz commented 1 year ago

Indeed it works on Express route gateways as well West Europe.

slaffka-vlasov commented 1 year ago

Confirmed creation of VpnGw1 in west us 3

laglergruener commented 7 months ago

Hi, we‘ve currently the same issue, mentioned above in WestEurope for AZ gateways. Is the problem still available? Thanks, Hannes

rik-v commented 6 months ago

This issue seems to be current again. We're currently unable to deploy any VPN gateway to any subscription or any tenant. Tried 3 different tenants, 4 different subscriptions, 2 types of SKU (VpnGw1 and VpnGw2), all in West Europe. We noticed this starting january 9, 2024, however, it is not unlikely this issue persisted before that date.

No mentions of this on the service health pages. So, curious as to what's causing it this time.

soufianerabi commented 6 months ago

Yeah, for the past three days, we've been facing the same issue in our organization. The deployment has failed multiple times, even after many retries. Unfortunately, Microsoft support wasn't helpful. We're still trying to deploy this resource, but it continues to fail.

tanarchytan commented 6 months ago

Also a problem on my side. tried it since 09 of January. Region West Europe with SKU VPNGw2

{ "code": "DeploymentFailed", "target": "/subscriptions/[...]/resourceGroups/rg-net-hub/providers/Microsoft.Resources/deployments/Microsoft.Template-20240115105923", "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.", "details": [ { "code": "ResourceDeploymentFailure", "target": "/subscriptions/[...]/resourceGroups/rg-net-hub/providers/Microsoft.Network/virtualNetworkGateways/vpng-[...]-prod", "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'." } ] }

rik-v commented 6 months ago

@soufianerabi @tanarchytan, I tried deploying in north europe last friday (jan. 12th, 2024), which completes successfully, which makes me believe Microsoft simply has some serious resource issues in the west europe region.

It's not the first time (or sign of) this is happening, a few months back we were unable to roll out other resources (VM's, VMSS'es (for AKS)) because apparently the resource was unavailable in a specific zone in west europe. We ended up deploying to "just" 2 zones instead of 3 (still ok, but we wanted 3 :)). To be clear: we haven't checked if those resources have become available again in all zones since then, so perhaps things improved (although I strongly doubt that considering the fact we can't deploy VPN to west europe)

Would be helpful if Microsoft would simply clarify the current state of west europe, if there's a capacity issue, that's annoying, but I'd rather have them tell us then having to find out this way...

TGosselink commented 6 months ago

Some here, VPN deploy in West Europe: The gateway deployment operation failed due to an intermittent error. Please try again. (Code: VmssGatewayDeploymentFailed)

lgriffithsdoherty commented 6 months ago

Me too

foxmeyson commented 6 months ago

Microsoft is not good, there are problems all the time. That's horrible

rbnmk commented 6 months ago

Currently facing same issue in West Europe for VpnGw1 and VpnGw1AZ

mtc3net commented 6 months ago

Me too

lgriffithsdoherty commented 6 months ago

Screenshot 2024-01-16 143451

Microsoft have called me to say that they are not sure what the issue is and it could be months before it is resolved. The attached states capacity constraints but still no resolution in site just a warning set up to say capacity constraints wont allow you to build gateways in west europe for the foreseeable.

rik-v commented 6 months ago

@lgriffithsdoherty , thank you for sharing this! :) This very unfortunate news, especially for new customers :( Their proposed (temporary) solution, although understandable, is unlikely to help much though as you can only deploy a gateway to a VNet in the same region. Which means you'll have to move your virtual network too, or you have to start managing 2. Also, this might incur outbound network traffic cost (because resources might be in different regions).

I hope Microsoft makes quick work of expanding their capacity in west europe :/

atovivan commented 6 months ago

This is a serious blocking issue for us as well, that will possibly postponed our go-live date. I tried to move only the vnet and vpn gateway to a different region but without success, possible we will need to move all network related resources or the entire infrastructure to a different region. For now we are using a bastion VM but that is not a cheap solution as on one VM only two people can work simultaneously.

Looks like Microsoft doesn't care much about this problem.

rik-v commented 6 months ago

@atovivan Instead of moving the VNet, you might want to try creating a second VNet for just the VPN, then peer that VNet with the 'original' VNet, that should work? (otherwise, perhaps a custom solution with a VM + custom VPN software (like wireguard) could work for your needs? However crappy that is, it might be 'a way out')

efcorpa commented 6 months ago

Cannot believe this is happening and even that Microsoft says it could be months before it is resolved. Can anybody confirm it works with a given SKU? Is there any workaround other than creating a VNet in another region and peer it with the one at WE?

Another question is: is this an intermitent error and perhaps if I'm lucky I will get my VPN deployed, or does it fail every time?

redoz commented 6 months ago

@efcorpa FWIW I tried deploying every SKU multiple times yesterday and it failed every single time.

jordyvpaassen commented 6 months ago

@efcorpa I've the same issue by different customers, the only one option is to make a resource group with a vpn gateway and a vnet in a other region and make a peer between your current vnet and the one that is deployed in a other region. If you have questions let me know

mestief commented 6 months ago

The problem unfortunately still exists 👎 Guess I'll have to use North Europe for now.

Agger1995 commented 5 months ago

I have been retrying every day for a few weeks at this point, and only just today was I able to succesfully deploy a VPN Gateway to westeurope, with the SKU VpnGW1. Something has been fixed, but I can't find any official information on the incident anywhere.

redoz commented 5 months ago

I have been retrying every day for a few weeks at this point, and only just today was I able to succesfully deploy a VPN Gateway to westeurope, with the SKU VpnGW1. Something has been fixed, but I can't find any official information on the incident anywhere.

Probably just enough people gave up on westeurope and picked a different region. We moved everything to Sweden central, not being able to reliably deploy things is not an risk we're willing to take.

MarkTallentire commented 4 months ago

Over a year later and this issue still persists. At a minimum it'd be nice if it was capacity issues that this is reflected in the error message.

wuxu92 commented 4 months ago

Hi @MarkTallentire, are you still experiencing this issue in West Europe region? the capacity issue should have been resolved already.

MarkTallentire commented 4 months ago

Hi @wuxu92

Yes, still seem to be getting this in West Europe and North Europe.

I was able to succesfully make one in UK South but our clients require us to have all our infrastructure in Europe

On Tue, 19 Mar 2024 at 04:43, Xu Wu @.***> wrote:

Hi @MarkTallentire https://github.com/MarkTallentire, are you still experiencing this issue in West Europe region? the capacity issue should have been resolved already.

— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-rest-api-specs/issues/21164#issuecomment-2005750977, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGIAYPSL4A3OZ3XN5KIUJ7DYY667HAVCNFSM6AAAAAARH366NGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBVG42TAOJXG4 . You are receiving this because you were mentioned.Message ID: @.***>