Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
89 stars 7 forks source link

Deployment Stack in DeploymentStackInNonTerminalState "Deleting Resources" #162

Closed h00jraq closed 5 months ago

h00jraq commented 5 months ago

Describe the bug My deployment did not had any state for quite a while and I could not delete it either (delete option was greyed out). IaC Pipeline was just running and running so after 35 minutes I had to cancel it. When I tried to run it again, I got below error

message: ERROR: (DeploymentStackInNonTerminalState) The deployment stack resource '/subscriptions/1939665b-33e2-4667-a535-882c27133abd/resourceGroups/rg-dev-xrm-nonprod-int-1/providers/Microsoft.Resources/deploymentStacks/pdoDev' could not be updated as it is currently in a non-terminal state 'DeletingResources'. Code: DeploymentStackInNonTerminalState Message: The deployment stack resource '/subscriptions/1939665b-33e2-4667-a535-882c27133abd/resourceGroups/rg-dev-xrm-nonprod-int-1/providers/Microsoft.Resources/deploymentStacks/pdoDev' could not be updated as it is currently in a non-terminal state 'DeletingResources'.

I had to wait like 10-15 minutes after pipeline cancelation to see that deployment stack has failed to delete some resources.

image

Now, when I'm trying to re-run the IaC pipeline again, I can see below screen

image

Private Endpoint was created but I can't see the Network Interface and pipeline is running again for like 15 minutes... Now, when I try to re-rub the bicep, it stuck on Private Endpoint creation:

image

I think I know why it has failed but tbh, I'm not sure how to resolve this issue.

I believe that the problem is, Private Dns Zone is located in different RG and in different subscription which can be seen in the privateEndpoint.bicep module code, to be more specific:

resource privateDnsZone 'Microsoft.Network/privateDnsZones@2020-06-01' existing = {
  scope: resourceGroup(peDnsZoneSubscrId, peDnsZoneResourceGroupName)
  name: privateDnsZoneName
}

and there was a resource lock Resource Group where vnetpe is located

I commented out below code in my main.bicep files and my first question was: Will Bicep try to remove both vnetpe and subnetpe if they were not in the deployment stack? My assumption was it will not try to delete them. I believe that bicep failed to remove A-records from private dns zones which are in different rg/subscription but this is client choice and not main so I'm trying to find out how I can

Here is the command which I'm using. Before I run deployment, I remove the lock on particular RG and re-apply it after. In this particular case, I forgot to remove remove the lock from one of the RG so it failed to delete "something" but I can't find out what exactly...

az stack group create \
  --name ${{ parameters.stackName }} \
  --subscription ${{ variables.SUBSCRIPTION_ID }} \
  --resource-group ${{ variables.RG }} \
  --template-file ${{ variables.BICEP_PATH }}/${{ variables.BICEP_TEMPLATE_FILE }} \
  --parameters ${{ variables.BICEP_PATH }}/${{ variables.BICEP_PARAM_FILE }} \
  --deny-settings-mode 'none' \
  --delete-resources
resource vnetpe 'Microsoft.Network/virtualNetworks@2023-05-01' existing = {
  scope: resourceGroup(subscrId, vnetRg)
  name: vnetPE
}

resource subnetpe 'Microsoft.Network/virtualNetworks/subnets@2023-05-01' existing = {
  name: subnetPE
  parent: vnetpe
}

module privateEndpointModuleWebApp '../../modules/privateEndpoint.bicep' = {
  name: 'pe-${peWebApp}'
  params: {
    appId: webApp.id
    location: location
    peAppName: peWebApp
    peDnsZoneResourceGroupName: peDnsZoneResourceGroupName
    peDnsZoneSubscrId: peDnsZoneSubscrId
    peSubnetId: subnetpe.id
    plServiceConnectionGroupId: 'sites'
    privateDnsZoneConfigsName: 'refMngmtApiWebAppConfig'
    privateEndpointDnsGroupName: 'webappPrivateEndpointDnsGroup'
    privateDnsZoneName: 'privatelink.azurewebsites.net'
  }
}

here is the privateEndpoint.bicep module code:

param appId string
param location string
param peAppName string
param peSubnetId string
param peDnsZoneSubscrId string
param peDnsZoneResourceGroupName string
param privateEndpointDnsGroupName string
param privateDnsZoneConfigsName string
param plServiceConnectionGroupId string
param privateDnsZoneName string

resource privateEndpoint 'Microsoft.Network/privateEndpoints@2021-05-01' = {
  name: peAppName
  location: location
  properties: {
    subnet: {
      id: peSubnetId
    }
    privateLinkServiceConnections: [
      {
        name: peAppName
        properties: {
          privateLinkServiceId: appId
          groupIds: [
            plServiceConnectionGroupId
          ]
          privateLinkServiceConnectionState: {
            status: 'Approved'
            description: 'Approved by pipeline'
          }
        }
      }
    ]
  }
}

resource privateDnsZone 'Microsoft.Network/privateDnsZones@2020-06-01' existing = {
  scope: resourceGroup(peDnsZoneSubscrId, peDnsZoneResourceGroupName)
  name: privateDnsZoneName
}

resource privateEndpointDnsGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2023-04-01' = {
  name: privateEndpointDnsGroupName
  parent: privateEndpoint
  properties: {
    privateDnsZoneConfigs: [
      {
        name: privateDnsZoneConfigsName
        properties: {
          privateDnsZoneId: privateDnsZone.id
        }
      }
    ]
  }
}

To Reproduce Steps to reproduce the behavior:

  1. Quite difficult to reproduce I'm afraid ...

Expected behavior Being able to see which resource were failed to delete

Repro Environment Host OS: Powershell Version:

Server Debugging Information Correlation ID: 2a2d910b-7846-4fbe-960f-b813a517ce65 Tenant ID: 7cc60758-5c35-453b-b9c2-099367865b7d Timestamp of issue (please include time zone): CET 4/17/2024, 4:41:56 PM Data Center (eg, West Central US, West Europe): UK-South

Additional context Add any other context about the problem here.

benemanu commented 2 weeks ago

Hey @h00jraq, i have the same Issue. Using the 'detachAll' value for the --action-on-unmanage flag seemed to resolve the Issue last time. But this time this didn't work. Do you have any other fix for this?