Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
89 stars 7 forks source link

Feature Request - Implement Drift Detection #117

Open muhammad-shameem opened 1 year ago

muhammad-shameem commented 1 year ago

It would be really helpful if we have drift detection functionality like aws CloudFormation and terraform, so far, we don't have that in place for bicep and ARM Template.

REF - https://www.hashicorp.com/blog/terraform-cloud-adds-drift-detection-for-infrastructure-management

azcloudfarmer commented 1 year ago

Hi @muhammad-shameem - in Azure we have deployment what-if: https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/deploy-what-if?tabs=azure-powershell

Are you asking about drift detection specific to deployment stacks?

muhammad-shameem commented 1 year ago

hi @apclouds , i know we have what-if to preview the changes before deployment, but it would be helpful if someone makes a change to deployment stack via portal i just want to see a drift happened in deployment stack quite similar to aws cloudformation and terraform.

D-Bissell commented 1 year ago

@azcloudfarmer For me the idea would be:

Of course if you have DenySettingsMode set to DenyWriteAndDelete this shouldn't be necessary, but in the other cases it may be very useful.

JFolberth commented 1 year ago

I just want to call out that 'what if' has a pretty major bug with nested bicep templates https://github.com/Azure/arm-template-whatif/issues/157#issuecomment-1044997859

As such I am not sure if I'd recommend that as a viable alternative at the moment.

D-Bissell commented 1 year ago

As @JFolberth says, -whatif's usefulness is somewhat dubious at the moment thanks to its issues with modules and 'existing' resources.

@azcloudfarmer Additionally (sorry): -whatif and drift detection fill two different roles.

Correct me if I'm wrong, but it looks like stacks don't keep a record or 'state' of resource properties. They just have a record of the resource ID and its management status.

  "resources": [
    {
      "denyStatus": "none",
      "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/demoRg/providers/Microsoft.Network/virtualNetworks/vnetzu6pnx54hqubm",
      "resourceGroup": "demoRg",
      "status": "managed"
    },
    {
      "denyStatus": "none",
      "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/demoRg/providers/Microsoft.Storage/storageAccounts/storezu6pnx54hqubm",
      "resourceGroup": "demoRg",
      "status": "managed"
    }
  ]

As-is, it looks like drift detection would be difficult as there's no 'desired' state recorded to compare to.

slavizh commented 1 year ago

May be some integration can be done by using resource configuration changes https://learn.microsoft.com/en-us/azure/governance/resource-graph/how-to/get-resource-changes?tabs=azure-cli.

mattias-fjellstrom commented 1 year ago

I agree that drift-detection is a nice feature to have, but ultimately if you care about drift (and want to avoid it) you should really just lock the resources from being deleted or modified. This is something you can't do with Terraform or AWS CloudFormation, so the ability to completely lock the resources in the stack trumps drift detection in one way.

What would be the benefit of not locking the stack but instead having drift detection and some auto-remediation feature?

ld0614 commented 10 months ago

I see it as useful when certain ARM resources don't support deployment stacks correctly so have to be excluded from locking (IP Groups is a painful one that I've come across recently)

samhodgkinson commented 5 months ago

Feature is available in Terraform , now pulumi link

Been able to understand drift is a very useful feature

mattias-fjellstrom commented 4 months ago

Regarding Terraform/Pulumi: I suppose the reason they support drift-detection is because they have no way of enforcing a lock on the resources? I still think locking the resources in your deployment stacks trumps drift detection. Of course this requires that all resources are supported and that you have control over your RBAC-setup so that there are no unauthorized removal of the locks.

alex-frankel commented 4 months ago

As noted by @JFolberth, we most likely need to resolve the what-if short-circuiting issue before we can make meaningful progress on this one. We also would need to decide if we can depend on what-if results to do the analysis or if we need to store this information somewhere in the Stacks RP as @D-Bissell notes. In theory we would be able to use the what-if API if we could solve the noise issue.

@mattias-fjellstrom - you are making a good point that if you use Deny Assignments, then you can actually stop drift from happening which is the most ideal outcome, but this requires operating in a stricter RBAC environment which may not be viable/easy to change.

Drift Detection is definitely something we're interested in pursuing, but a few things need to fall in place first to make the experience worthwhile.

samhodgkinson commented 4 months ago

Resource locks and policy controls serve as great methods to prevent and control Azure control plain actions. Where teams still require "click-ops" to perform BAU tasks and do not have all aspects of the platform maintained and managed with IAC, these tools intend to prevent configuration drift and impede support teams' ability to deliver. Sometimes the reasons to bypass the controls are to prevent larger platform incidents or changes where manual implementation is the quicker solution. Either way, the intended benefits of implementing these capabilities alone are normally lost, to give way to fix or implement a required change.

Being able to report on the change and then correct or amend the current platform state would make lifecycle management in platforms where "click-ops" is still a required part of platform management much simpler to manage and remediate.