Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
87 stars 7 forks source link

OS disks on Azure VM not purged (and other implicitly deployed resources) #28

Closed slavizh closed 1 year ago

slavizh commented 2 years ago

Putting this as may be a discussion or reminder to document some important things. It is quite common that the OS disk is defined inside virtualMachine resource and it is not defined separately as resource. In those cases deployment stacks will not purge the OS disk. I think we either should document those cases or mention those limitations/behaviors by giving the Azure VM example which is probably the most common case.

bmoore-msft commented 2 years ago

There are a handful of implicit resources like this (I think you're right that disks is most common)... is your expectation that they should be deleted on purge? or retained because they have data? (VM Team thinks the latter). Would your opinion be different if the resource did not contain data?

slavizh commented 2 years ago

@bmoore-msft for me the logical step would be that a guidance is provided that if you want a resource to be purged/managed by deployment stack it needs to be defined within the template. There is also data on the Data disks but those are purged so I do not see that OS disk should be treated differently than data disks. Having to provide guidance will not have to force deployment stacks team or Compute team does something for this special case but it is rather end users that should improve their code for best practice. For example the template generated via Portal when you create VM does not take care of managing the OS disk as separate resource.

bmoore-msft commented 2 years ago

So in your case the data disks were explicitly defined in the template and the OS disk was not?

slavizh commented 2 years ago

yes. I believe (if I remember correctly) data disks you need to create in advance and reference them in the virtual machine resource. For OS disk you specify which image you want the VM to be deployed with and that creates the OS disk. So in order to manage it you will need to have dependent resource on the VM that is for the os disk defined.

bmoore-msft commented 2 years ago

Would like to open this up to a broader audience - what is the expectation around "implicit" resources. They are not defined in the template but are deployed as a result of deploying some other resource that is in the template. Currently, those resources are not ever managed and will not be purged.

1) Are there scenarios where not managing (i.e. not locking) these resource can be problematic? (I don't think disks are) 2) Would you expect them to be deleted when the stack is deleted?

Discuss.

slavizh commented 2 years ago

My view:

  1. could be problematic as I think end users would expect to see those resources under managed. OS disk is just a small example because it is a single resource but the thing with kuberenetes clusters there are lot of resources created by the main resource. Synapse analytics does the same, private endpoints as well, etc.
  2. I would expect to be deleted as if you destroy the main resource there is not reason those resources to be left out and being charged for them.
maikvandergaag commented 2 years ago

My view:

  1. I think it could be problematic in some scenario's but also think they were placed outside of a deployment stack for a specific reason.
  2. If they are placed outside the stack I presume someone really thought of the specific scenario. That is also why I think they should not be deleted if the stack is deleted.
slavizh commented 2 years ago

@maikvandergaag keep in mind that the experience will be inconsistent. For example Kubernetes will delete its resources, Azure VM will not delete its os disk. This means you will need to know very well the resource in order to know how it will work within the stack.

maikvandergaag commented 2 years ago

@slavizh: Yes I know and I think with for example Kubernetes which created another resource group automatically that it should be included in the stack as you have limited control over the resources that are created.

With for example a VM that is referencing an existing data disk out side of the stack that data disk should be left alone when deleting the stack.

bmoore-msft commented 2 years ago

An interesting part here is that the RPs that created the resource, decide on the cleanup behavior for those resources... while inconsistent makes some sense. Not being able to lock something almost seems more interesting. For VMs/Disks you can managed this by using explicit resources, but there are plenty of other resources (I'm struggling a bit to come up with examples) where you can't turn them into explicit resources to be managed by the stack.

slavizh commented 2 years ago

@bmoore-msft Os disk is easy but some other things like kubernetes, synapse, etc. are more complex as they create more resources. If you choose inconstant I think you will need to document better all these behaviors per RP in order to provide better deployment experience which is my proposal from the start.

slavizh commented 2 years ago

Synapse even goes further as you cannot manage the resources at all as you have some deny assignment (it is not lock). Failed to update diagnostics for 'sqlpool0001'.{"error":{"code":"DenyAssignmentAuthorizationFailed","message":"The client 'redacted' with object id 'redacted' has permission to perform action 'microsoft.insights/diagnosticSettings/write' on scope '/subscriptions/redacted/resourceGroups/lz-synapse-managed-group/providers/Microsoft.Sql/servers/stan000121/databases/sqlpool0001/providers/microsoft.insights/diagnosticSettings/test'; however, the access is denied because of the deny assignment with name 'f6985cca-70c5-4918-bafb-700160ff8d4f' and Id 'f6985cca70c54918bafb700160ff8d4f' at scope '/subscriptions/redacted/resourceGroups/lz-synapse-managed-group'."}}.

They way different teams implement this is so wrong as they create different patterns per RP thus making this anti-management pattern.

slavizh commented 2 years ago

@bmoore-msft Seems VM API has a new option for OS disk - deleteOption. Deletes/retains the OS disk when VM is deleted.

"storageProfile": {
                    "osDisk": {
                        "createOption": "fromImage",
                        "managedDisk": {
                            "storageAccountType": "[parameters('osDiskType')]"
                        },
                        "deleteOption": "[parameters('osDiskDeleteOption')]"
                    },
bmoore-msft commented 1 year ago

@bmoore-msft Seems VM API has a new option for OS disk - deleteOption. Deletes/retains the OS disk when VM is deleted.

"storageProfile": {
                    "osDisk": {
                        "createOption": "fromImage",
                        "managedDisk": {
                            "storageAccountType": "[parameters('osDiskType')]"
                        },
                        "deleteOption": "[parameters('osDiskDeleteOption')]"
                    },

Was looking at this for nic's, publicIp's, et al for VMs... it seems like this is the way to go for now. I don't think it's going to cover all the other implicitly created resources but I'm going to open a "generic" issue to start building that list... will close this one out.