Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
90 stars 7 forks source link

How should a stack manage non-deletable resources, such as automation account jobs? #188

Open MgrothEcovadis opened 2 months ago

MgrothEcovadis commented 2 months ago

Deployment stack cannot be removed if it includes Automation account jobs Remove-AzResourceGroupDeploymentStack: Long running operation failed with status 'failed'

error: The resource could not be deleted for an unknown reason.

An error occurred while deleting resources. These resources are still present in the stack but can be deleted manually.

To Reproduce Steps to reproduce the behavior:

  1. deploy deployment stack with automationAccount/Job resource in Bicep template
  2. remove deployment stack with -ActionOnUnmanage 'deleteResources/deleteAll' params

Expected behavior automation account jobs will be excluded from the deployment stack or capable of being removed.

Repro Environment powershell az version: 12.2.0 Powershell Version: 7.4.5 automation job bicep version: Microsoft.Automation/automationAccounts/jobs@2023-11-01

dantedallag commented 2 months ago

Hey @MgrothEcovadis, thanks for raising this issue! Could you provide us with the correlation id of the failed delete so we can look further into what happened that caused this error?

MgrothEcovadis commented 2 months ago

cbc2aed4-c8e6-4ca0-a3ff-b47fad0e44a8

Best Regards

snarkywolverine commented 1 month ago

Hi @MgrothEcovadis -

Thanks for the correlation ID. As we continue to investigate, I'm curious - do you see any value in having a DenyAssignment protecting the Automation Job from a future write?

azcloudfarmer commented 1 month ago

Hello @MgrothEcovadis, friendly reminder on the question above. Thanks!

MgrothEcovadis commented 1 month ago

@snarkywolverine does it mean that creating a deployment stack with an automation job in the template will cause an error or will it just be excluded from the deployment stack resource list?

snarkywolverine commented 1 month ago

I'm just trying to think through the options, and the ramifications of each.

  1. Maintain the status quo - this will alert users that the automation job was not deleted, and requires running the template with ActionOnUnmanage=detachResources to remove the automation job and have a successful stack run. This allows for a denyWriteAndDelete assignment, which can prevent users from modifying the automation job outside of the stack. My question above was asking... is this valuable?
  2. Remove automation jobs from the managed resource list -- this would mean a denyWriteAndDelete assignment could not be applied, and all users could overwrite the automation job. On the other hand, once the automation job is removed from the stack, the stack run would not fail because of automation jobs. (This can be viewed as a positive - we removed a noisy failure -- or a negative, because we're not cleaning up a resource that was created by the stack)
  3. Keep the resource in the managed resource list, but -- even if the user specifies ActionOnUnmanage=DeleteResources, we would detach the automation job (since it can't be deleted). A DenyWriteAndDelete assignment can still apply in this case.
  4. Some other scenario I haven't thought of above.

For now, we're going to stick with Option 1. I'm marking this as 'needs upvote' as a way to solicit customer feedback. (I'll also update the title to allow other resources to also be addressed.

MgrothEcovadis commented 1 month ago

I don't know if denyWriteAndDelete taking into consideration in the case of Automation jobs. They're just execution of automation account runbook, only actions available are resume/stop/suspend. but if they're finished nothing can be done apart of checking the logs. The Automation jobs should be excluded from the deployment stack completely as the jobs are automatically removed after 30 days anyway.