Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
87 stars 6 forks source link

Deploying Keyvault Secrets as part of a stack, the removal is not working incase of purge protection #142

Open bijlm opened 7 months ago

bijlm commented 7 months ago

Describe the bug I have a keyvault with secrets in a stack. The keyvault has softDelete and purgeProtection turned on. When I remove a secret from my stack description, the next deployment takes about 6 minutes (longer) and the secret is reported as could not remove...

Expected behavior I would have expected the secret to be softDeleted so it would remain in the "deleted secrets" section for as long as the retention is configured.

Repro Environment Host OS: Windows Powershell Version: 7.4.0 Data Center : West Europe

dantedallag commented 7 months ago

Hi @bijlm, could you provide a correlation id for this failure and, if possible, a simplified template that produces a repo of this?

bijlm commented 7 months ago

HI Dante,

I reinitiated our pipeline with one secret less then before and I got a message like this: image When looking into the deployment stack I see the error is referring to the actual left out secret.

If I look into the activity log then, I see this: image

I included the permissions for the secret users (even making the service connection identity a secret officer makes no difference).

kv.zip This zip contains a couple of bicep files and a simple example for a deployment.

dantedallag commented 7 months ago

Thanks @bijlm for providing the extra context and the files to repro. First though, we need a correlation id for the particular failing request to better understand what exactly is happening.

bijlm commented 7 months ago

@dantedallag This is a specific example of a failing instance. Correlation id: '9f5011e2-7113-4b7c-9e39-b5b4c568f56f'

dantedallag commented 7 months ago

@bijlm Thanks! I'm looking into this.

Could you share what parameters you used to create this stack (maybe share the command you ran either in PS or CLI)? Mostly I am curious about your expected deny settings and if you specified DenySettingsApplyToChildScopes.

bijlm commented 6 months ago

Yup, this is the way I initiate the deploymentStack from the Azure Devops Pipeline (AzurePowerShell@5)

Deploy the deploymentStack

$hashParameters = [ordered]@{ Name = '${{ variables.ResourceGroupName }}_KeyVault' ResourceGroupName = '${{ variables.ResourceGroupName }}' TemplateFile = $templateFile TemplateParameterFile = $parameterFileTmp DenySettingsMode = 'DenyWriteAndDelete' DeleteAll = $true Force = $true } Write-Output "##[verbose]Parameters for the deploymentStack: $($hashParameters | Format-Table -AutoSize -Wrap | Out-String -Width 80)" New-AzResourceGroupDeploymentStack @hashParameters

bijlm commented 6 months ago

@dantedallag did you have any time to look into this? Or any information update?

dantedallag commented 6 months ago

Hey @bijlm, sorry for the delay. I am able to reproduce this issue and am still looking into the root cause. Will hopefully have something for you next week.

azcloudfarmer commented 5 months ago

Hello @bijlm - quick update. We are still working through potential solutions to this issue and will be noting this case as part of our Known Issues in Azure Docs.

dantedallag commented 5 months ago

If you are still looking for a workaround, I would suggest doing a multi-phased deployment to first detach the secrets and then delete the remaining resources you would like to remove from the stack.

snarkywolverine commented 5 months ago

Hi @bijlm - We wanted to give you an update on this issue.

The short version is that, at least for now, we plan to keep the design as-is, for the reasons outlined below. However, we welcome any feedback you (or other customers) may have to make this experience better.

KeyVault secrets are designed to be managed in the data plane; there is no support for deleting secrets via ARM control plane. Given this limitation, we considered having secrets be detached silently -- this would enable the stack PUT to succeed, but may have unexpected behavior for customers who are not familiar with this scenario.

We plan to make some small investments to make the situation better - including a clearer error message - but stacks will continue to fail to help make customers aware of the limitations.

We are also tracking a couple of bugs related to the scenario: (1) If a KeyVault and a secret appear in a template, and both are removed from the stack simultaneously, the stack operation will still fail. (2) If a secret has already been deleted, the stack operation will still fail.

As stated at the beginning though -- the goal is that, if the secret exists and can't be deleted by the stack, the stack will fail. This is true for both stack delete and stack update.

Hope that makes sense.

hallgeir-osterbo-visma commented 5 months ago

If you are still looking for a workaround, I would suggest doing a multi-phased deployment to first detach the secrets and then delete the remaining resources you would like to remove from the stack.

In practice, how would one go about doing this? Is it even possible to detatch individual resources (like secrets)?

snarkywolverine commented 4 months ago

@hallgeir-osterbo-visma To detach secrets in Powershell, remove them from the template and call the Set-*DeploymentStack cmdlet with none of the -Delete flags (-DeleteAll or -DeleteResources).

From there, if you want to delete other resources, remove those from the template and then call Set- again with -DeleteResources.

The experience should be similar for AzCLI.

hallgeir-osterbo-visma commented 4 months ago

That's not very elegant though in a CI/CD scenario. That would also detatch other resources that we would want to delete. Is there any plans to support detatching resources from the stack directly? Something like... az stack detatch --resource-id ... or something crazy like that. Because then we can bake this into the CI/CD by having a "migration" script running before the deployment.

snarkywolverine commented 4 months ago

Another option would be to use the following flow:

  1. Deploy the updated template using the chosen delete flags
  2. Wait for failure due to the KV secret that failed to delete
  3. Re-run the stack with the same template in detach mode

Would that be a sufficient workaround?

At this time, we don't have any plans for commands to manage resources outside of template deployments.

hallgeir-osterbo-visma commented 4 months ago

Another option would be to use the following flow:

  1. Deploy the updated template using the chosen delete flags
  2. Wait for failure due to the KV secret that failed to delete
  3. Re-run the stack with the same template in detach mode

Would that be a sufficient workaround?

At this time, we don't have any plans for commands to manage resources outside of template deployments.

I'll be honest and say it's not optimal. We have usually run the template in delete mode, then deleted the whole deployment stack. It works, but it's awkward. This approach would kind of be similar. We could probably automate it, but if we detect that deletion fails we ALSO need to check if the deletion that failed was key vault secrets (and only that).

Anyhow, we will survive, but it would be great to have an alternative. Maybe that could be a feature request then, to be able to detatch individual resources using Azure CLI or similar?

alex-frankel commented 4 months ago

@hallgeir-osterbo-visma -- I see a couple of issues with the proposal:

For context, this thread is leading us to instead prioritize the longer-term fix which is to introduce an extensibility provider for Key Vault data plane. The conversation for this is only just getting started, so no commitment on starting the work or ETA, but as far as we can tell it is the only way to properly fix this one.

snarkywolverine commented 1 month ago

This scenario should improve with our w23 release (which should roll out over the next couple of weeks). With this change, a secret that has been deleted from KV and is removed from the deployment template will no longer persist in the stack. This does still require some manual fidgeting (namely, manually deleting the secret...) but it should help improve the experience until we can treat secrets as extensible resources.

rikjansen-hu commented 3 weeks ago

KeyVault secrets are designed to be managed in the data plane; there is no support for deleting secrets via ARM control plane. Given this limitation, we considered having secrets be detached silently -- this would enable the stack PUT to succeed, but may have unexpected behavior for customers who are not familiar with this scenario.

@snarkywolverine :

  1. If secret/key/certificate management should be done through the data plane only, then, by extension, creation of these subresources should also not be possible through Bicep.
  2. Not allowing management of these subresources through ARM will greatly frustrate the CI/CD experience, as very often these subresources are prereqs for other resources (e.g. CMK enabled storage relies on CMK keys in a KeyVault).