Azure / Enterprise-Scale

The Azure Landing Zones (Enterprise-Scale) architecture provides prescriptive guidance coupled with Azure best practices, and it follows design principles across the critical design areas for organizations to define their Azure architecture
https://aka.ms/alz
MIT License
1.73k stars 980 forks source link

Sometimes Policy Assignments Fail During Deployment As Part Of Portal Experience Deployment #902

Open jtracey93 opened 2 years ago

jtracey93 commented 2 years ago

Describe the bug

Occasionally and randomly some policy assignments will fail to deploy as part of the portal deployment/accelerator experience due to the below error:

{
   "status": "Failed",
   "error": {
       "code": "DeploymentFailed",
       "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
       "details": [
           {
               "code": "BadRequest",
               "message": "{\r\n  \"error\": {\r\n    \"code\": \"InvalidCreatePolicyAssignmentRequest\",\r\n    \"message\": \"The policy definition specified in policy assignment 'Deny-DataB-Sku' is out of scope. Policy definitions should be specified only at or above the policy assignment scope. If the management groups hierarchy changed recently or if assigning a management group policy to new subscription, please allow up to 30 minutes for the hierarchy changes to apply and try again.\"\r\n  }\r\n}"
           }
       ]
   }
}

Workaround

A simple re-run of the portal deployment/accelerator experience, with exactly the same input parameters specified, as per our guidance in Known Issues will resolve the issue normally on the 2nd attempt when this occurs.

This happens as the underlying platform replication has caught up and the node processing the deployment request is able to find

Repro Steps

Hard to replicate but occurs more often in region that have pairs.

Related Issues

Update 11/04/2022 (11th April)

We have increased the portal deployment replication delay called preparingToLaunch from 20 to 30 deployments in an effort to improve the success rate, whilst we work with engineering teams on the root cause of this issue.

mikewo-dc commented 2 years ago

Had this error happen on deployment circa 22nd June with deployment delay 30. It was just one single policy assignment "'Deny-Subnet-Without-Nsg" on the "aaa-landingzones" management group scope. The deployment Status reported "Conflict" in the portal (conflicting with what?). I remediated manually. Happy to provide some more info if I have any

jtracey93 commented 2 years ago

Hey @mikewo-dc, thanks for letting us know, if you can provide the correlation ID I can take a look further into this

Thanks

Jack.

mikewo-dc commented 2 years ago

Hey @mikewo-dc, thanks for letting us know, if you can provide the correlation ID I can take a look further into this

I think this is what you need? Let me know "trackingId": "61aec342-0327-43bb-95db-6a637484fd49",

jtracey93 commented 2 years ago

Hey @mikewo-dc,

I couldnt find that ID in our logs. You can get the correlation ID from https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/deployment-history?tabs=azure-portal#management-group-deployments

mikewo-dc commented 2 years ago

@jtracey93 hope this is what you need, "correlationId": "c35fac21-69e6-4cf3-b861-da1aea74a1da", that's for the entire deployment? My engagement is finished and access removed so I can't look at the Management Groups at the moment, but could ask if needed.

crossitwe11 commented 2 years ago

Hi, the AdventureWorks deployment keeps failing for me. My correlation id is 6b27f884-c0c2-4b39-907c-b66c240759fc. The policies would not deploy like above, so I ran it again. It got past policies and now the private dns entries are failing to deploy. { "status": "Failed", "error": { "code": "InvalidDeployment", "message": "The 'location' property is not allowed for 'alz-PrivDNSLite-southcentralus-3e0046' at resource group scope. Please see https://aka.ms/deploy-to-subscription for usage details." } }

jtracey93 commented 2 years ago

Hi, the AdventureWorks deployment keeps failing for me. My correlation id is 6b27f884-c0c2-4b39-907c-b66c240759fc. The policies would not deploy like above, so I ran it again. It got past policies and now the private dns entries are failing to deploy. { "status": "Failed", "error": { "code": "InvalidDeployment", "message": "The 'location' property is not allowed for 'alz-PrivDNSLite-southcentralus-3e0046' at resource group scope. Please see https://aka.ms/deploy-to-subscription for usage details." } }

Have seen you raised a separate issue for this which we will investigate this morning #1041

jtracey93 commented 2 years ago

Trigger ADO Sync 1

jtracey93 commented 2 years ago

Trigger ADO Sync 2

H-Nawaz commented 11 months ago

Hi JT, having issues with the same policy assignments using terraform but getting the same error as above.

Error: creating Scoped Policy Assignment (Scope: "/providers/Microsoft.Management/managementGroups/Legacy_MG" │ Policy Assignment Name: "deny_resource_types"): unexpected status 400 with error: InvalidCreatePolicyAssignmentRequest: The policy definition specified in policy assignment 'deny_resource_types' is out of scope. Policy definitions should be specified only at or above the policy assignment scope. If the management groups hierarchy changed recently or if assigning a management group policy to new subscription, please allow up to 30 minutes for the hierarchy changes to apply and try again. This only appears to happen when applying to child management group. If I assign to org root this works fine.

jtracey93 commented 11 months ago

Hi JT, having issues with the same policy assignments using terraform but getting the same error as above.

Error: creating Scoped Policy Assignment (Scope: "/providers/Microsoft.Management/managementGroups/Legacy_MG" │ Policy Assignment Name: "deny_resource_types"): unexpected status 400 with error: InvalidCreatePolicyAssignmentRequest: The policy definition specified in policy assignment 'deny_resource_types' is out of scope. Policy definitions should be specified only at or above the policy assignment scope. If the management groups hierarchy changed recently or if assigning a management group policy to new subscription, please allow up to 30 minutes for the hierarchy changes to apply and try again. This only appears to happen when applying to child management group. If I assign to org root this works fine.

As discussed offline this was due to an incorrect MG ID being provided in the TF code, not related to this issue 👍