Azure / Enterprise-Scale

The Azure Landing Zones (Enterprise-Scale) architecture provides prescriptive guidance coupled with Azure best practices, and it follows design principles across the critical design areas for organizations to define their Azure architecture
https://aka.ms/alz
MIT License
1.69k stars 956 forks source link

Diag settings fail to deploy #1712

Open soderholmd opened 1 month ago

soderholmd commented 1 month ago

Using the ALZ deployment wizard in a newly provisioned tenant. 10 of the diag settings failed to deploy with error:

{
    "status": "Failed",
    "error": {
        "code": "DeploymentFailed",
        "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
        "details": [
            {
                "code": "Conflict",
                "message": "{\r\n  \"code\": \"InvalidAuthenticationToken\",\r\n  \"message\": \"\"\r\n}"
            }
        ]
    }
}

All other resources deployed successfully but overall ALZ deployment job failed with error below.

Steps to reproduce

  1. Followed deployment template into new (Msft managed) tenant with VWAN enabled (UK South + UK West region). Most other settings left as default.

Screenshots

{
  "code": "DeploymentFailed",
  "target": "/providers/Microsoft.Resources/deployments/NoMarketplace-20240723105008",
  "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
  "details": [
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-landingzones/providers/Microsoft.Resources/deployments/ds-landingzonesalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-corp/providers/Microsoft.Resources/deployments/ds-corpalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e-b2b99",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-management/providers/Microsoft.Resources/deployments/ds-managementalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-sandboxes/providers/Microsoft.Resources/deployments/ds-sandboxesalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e-",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-online/providers/Microsoft.Resources/deployments/ds-onlinealz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e-b2b",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-platform/providers/Microsoft.Resources/deployments/ds-platformalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e-b",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-identity/providers/Microsoft.Resources/deployments/ds-identityalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e-b",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds/providers/Microsoft.Resources/deployments/dsalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c6e-b2b995d723",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-decommissioned/providers/Microsoft.Resources/deployments/ds-decommissionedalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-connectivity/providers/Microsoft.Resources/deployments/ds-connectivityalz-DiagSettingsMGs-uksouth-f56c0e93-74e8-5129-9c",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    }
  ]
}
jtracey93 commented 1 month ago

@soderholmd can you register the microsoft.insights RP on the Management subscription and try again for me?

soderholmd commented 1 month ago

@jtracey93 just checked and it is already registered on all subscriptions apart from Sandbox.

soderholmd commented 1 month ago

@jtracey93 tried redeploying after confirming resource providers enabled. Same result for five of the diag rules:

{
  "code": "DeploymentFailed",
  "target": "/providers/Microsoft.Resources/deployments/NoMarketplace-20240723163706",
  "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
  "details": [
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-corp/providers/Microsoft.Resources/deployments/ds-corpalz-DiagSettingsMGs-uksouth-446adbf1-56d3-5ed9-8a54-4d77d",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-landingzones/providers/Microsoft.Resources/deployments/ds-landingzonesalz-DiagSettingsMGs-uksouth-446adbf1-56d3-5ed9-8a",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-platform/providers/Microsoft.Resources/deployments/ds-platformalz-DiagSettingsMGs-uksouth-446adbf1-56d3-5ed9-8a54-4",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds-decommissioned/providers/Microsoft.Resources/deployments/ds-decommissionedalz-DiagSettingsMGs-uksouth-446adbf1-56d3-5ed9-",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    },
    {
      "code": "ResourceDeploymentFailure",
      "target": "/providers/Microsoft.Management/managementGroups/ds/providers/Microsoft.Resources/deployments/dsalz-DiagSettingsMGs-uksouth-446adbf1-56d3-5ed9-8a54-4d77dd9c17",
      "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
    }
  ]
}

Same 'invalid authentication token' error.

jtracey93 commented 1 month ago

@soderholmd are you able to re-test this today?

Also can you confirm the other Management Group Diagnostic Settings deployments are completing successfully?

I see in the output above only these MGs are failing:

soderholmd commented 1 month ago

@jtracey93 worked on the third attempt! First time ten failed, second time five failed, third time no failures.

jtracey93 commented 1 month ago

interesting, i think there may be a platform issue. I will investigate further