Azure / ARO-Landing-Zone-Accelerator

ARO Landing Zone Accelerator Reference Implementation Repo
MIT License
47 stars 57 forks source link

ARO terraform provisioning internal error #27

Open infosatheesh2020 opened 2 years ago

infosatheesh2020 commented 2 years ago

ARO deployment failed due to internal error

╷ │ Error: waiting for creation of Template Deployment "aro" (Resource Group "spoke-aro"): Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"InternalServerError\",\r\n \"message\": \"Internal server error.\"\r\n }\r\n ]\r\n }\r\n}"}] │ │ with module.aro.azurerm_resource_group_template_deployment.aro, │ on modules\aro\aro.tf line 13, in resource "azurerm_resource_group_template_deployment" "aro": │ 13: resource "azurerm_resource_group_template_deployment" "aro" { │ ╵ ╷

image

jgardner04 commented 2 years ago

@infosatheesh2020, is this based on the TF in the main branch?

infosatheesh2020 commented 2 years ago

@jgardner04 Yes initial error was based on TF in main branch. I now validated with your latest fixes in "terraform" branch and have below errors

╷ │ Error: Code="VMExtensionProvisioningError" Message="VM has reported a failure when processing extension 'jumpbox'. Error message: \"Failed to download all specified files. Exiting. Error Message: The remote server returned an error: (404) Not Found.\"\r\n\r\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSEWindowsTroubleshoot " │ │ with module.vm.azurerm_virtual_machine_extension.jumpbox, │ on modules\vm\vm.tf line 74, in resource "azurerm_virtual_machine_extension" "jumpbox": │ 74: resource "azurerm_virtual_machine_extension" "jumpbox" { │ ╵ ╷ │ Error: creating Monitor Diagnostics Setting "hub-aromKmeQ6" for Resource "/subscriptions/XXXXXXXXXX/resourceGroups/hub-aro/providers/Microsoft.Network/azureFirewalls/azfw": insights.DiagnosticSettingsClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="Conflict" Message="Data sink '/subscriptions/XXXXXXXXXX/resourceGroups/hub-aro/providers/Microsoft.OperationalInsights/workspaces/hub-aro' is already used in diagnostic setting 'hub-aro' for category 'AzureFirewallApplicationRule'. Data sinks can't be reused in different settings on the same category for the same resource." │ │ with module.vnet.azurerm_monitor_diagnostic_setting.fw_diag, │ on modules\vnet\firewall.tf line 389, in resource "azurerm_monitor_diagnostic_setting" "fw_diag": │ 389: resource "azurerm_monitor_diagnostic_setting" "fw_diag" { │ ╵

jgardner04 commented 2 years ago

This is an issue when running multiple times after deleting the environment. The Diagnostic settings don't get deleted. When I re-run the script, I have to look for the diagnostic settings in my subscription and delete it. I will need to add instructions on removing this and then create a script to automate it.

infosatheesh2020 commented 2 years ago

@jgardner04 - Thanks for details. My initial environment get stuck with internal error and hence I tried to reprovision from scratch. Additional instructions would be awesome!

jgardner04 commented 2 years ago

I have updated the deployment script location in the Terraform branch.

infosatheesh2020 commented 2 years ago

@jgardner04 Seems the token for the jumpbox extension script is wrong.

https://raw.githubusercontent.com/Azure/ARO-Landing-Zone-Accelerator/terraform/deployment/terraform/modules/vm/start_script.ps1?token=GHSAT0AAAAAABXBWKU65G3ZBU46JIP2TN3QYX5PMDQ returns 404 error.

Correct URL: https://raw.githubusercontent.com/Azure/ARO-Landing-Zone-Accelerator/terraform/deployment/terraform/modules/vm/start_script.ps1?token=GHSAT0AAAAAABPPDAMN3S5AEFTXPYH7BTAYYX5WUVQ

The second issue of reusing-data sink still exists

╷ │ Error: creating Monitor Diagnostics Setting "hub-arokjekrN" for Resource "/subscriptions/c0d678c5-3792-47c7-b596-ab9b3bb58362/resourceGroups/hub-aro/providers/Microsoft.Network/azureFirewalls/azfw": insights.DiagnosticSettingsClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="Conflict" Message="Data sink '/subscriptions/c0d678c5-3792-47c7-b596-ab9b3bb58362/resourceGroups/hub-aro/providers/Microsoft.OperationalInsights/workspaces/hub-aro' is already used in diagnostic setting 'hub-aro' for category 'AzureFirewallApplicationRule'. Data sinks can't be reused in different settings on the same category for the same resource." │ │ with module.vnet.azurerm_monitor_diagnostic_setting.fw_diag, │ on modules\vnet\firewall.tf line 389, in resource "azurerm_monitor_diagnostic_setting" "fw_diag": │ 389: resource "azurerm_monitor_diagnostic_setting" "fw_diag" { │

jgardner04 commented 2 years ago

The 404 issue is because the file is in a private repo. When the repo is made public, we will drop the token portion, and this should not be an issue going forward.

The second issue in your comment is regarding a diagnostic setting. There are stored outside the Resource Group, so if you have deployed it to your subscription before, you will need to remove those before trying to re-deploy. You can find these in Azure->Monitor->Diagnostic Settings -> (in this case) azfw. You should see one called hub-arojekrN. Try to delete this and see if you can re-deploy.

infosatheesh2020 commented 2 years ago

@jgardner04 While resource cleanup, we need to add deleting RP permissions at subscription level, since that is causing deployment error during re-run.

module.supporting.azurerm_private_endpoint.cosmos: Creation complete after 11m50s [id=/subscriptions/c0d678c5-3792-47c7-b596-ab9b3bb58362/resourceGroups/spoke-aro/providers/Microsoft.Network/privateEndpoints/cosmosdbPvtEndpoint] ╷ │ Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="RoleAssignmentExists" Message="The role assignment already exists." │ │ with module.aro.azurerm_role_assignment.resource_provider_assignment[1], │ on modules\aro\aro.tf line 6, in resource "azurerm_role_assignment" "resource_provider_assignment": │ 6: resource "azurerm_role_assignment" "resource_provider_assignment" { │ ╵ ╷ │ Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="RoleAssignmentExists" Message="The role assignment already exists." │ │ with module.aro.azurerm_role_assignment.resource_provider_assignment[0], │ on modules\aro\aro.tf line 6, in resource "azurerm_role_assignment" "resource_provider_assignment": │ 6: resource "azurerm_role_assignment" "resource_provider_assignment" { │