Open bijlm opened 7 months ago
Hi @bijlm, could you provide a correlation id for this failure and, if possible, a simplified template that produces a repo of this?
HI Dante,
I reinitiated our pipeline with one secret less then before and I got a message like this:
When looking into the deployment stack I see the error is referring to the actual left out secret.
If I look into the activity log then, I see this:
I included the permissions for the secret users (even making the service connection identity a secret officer makes no difference).
kv.zip This zip contains a couple of bicep files and a simple example for a deployment.
Thanks @bijlm for providing the extra context and the files to repro. First though, we need a correlation id for the particular failing request to better understand what exactly is happening.
@dantedallag This is a specific example of a failing instance. Correlation id: '9f5011e2-7113-4b7c-9e39-b5b4c568f56f'
@bijlm Thanks! I'm looking into this.
Could you share what parameters you used to create this stack (maybe share the command you ran either in PS or CLI)? Mostly I am curious about your expected deny settings and if you specified DenySettingsApplyToChildScopes
.
Yup, this is the way I initiate the deploymentStack from the Azure Devops Pipeline (AzurePowerShell@5)
$hashParameters = [ordered]@{ Name = '${{ variables.ResourceGroupName }}_KeyVault' ResourceGroupName = '${{ variables.ResourceGroupName }}' TemplateFile = $templateFile TemplateParameterFile = $parameterFileTmp DenySettingsMode = 'DenyWriteAndDelete' DeleteAll = $true Force = $true } Write-Output "##[verbose]Parameters for the deploymentStack: $($hashParameters | Format-Table -AutoSize -Wrap | Out-String -Width 80)" New-AzResourceGroupDeploymentStack @hashParameters
@dantedallag did you have any time to look into this? Or any information update?
Hey @bijlm, sorry for the delay. I am able to reproduce this issue and am still looking into the root cause. Will hopefully have something for you next week.
Hello @bijlm - quick update. We are still working through potential solutions to this issue and will be noting this case as part of our Known Issues in Azure Docs.
If you are still looking for a workaround, I would suggest doing a multi-phased deployment to first detach
the secrets and then delete
the remaining resources you would like to remove from the stack.
Hi @bijlm - We wanted to give you an update on this issue.
The short version is that, at least for now, we plan to keep the design as-is, for the reasons outlined below. However, we welcome any feedback you (or other customers) may have to make this experience better.
KeyVault secrets are designed to be managed in the data plane; there is no support for deleting secrets via ARM control plane. Given this limitation, we considered having secrets be detached silently -- this would enable the stack PUT to succeed, but may have unexpected behavior for customers who are not familiar with this scenario.
We plan to make some small investments to make the situation better - including a clearer error message - but stacks will continue to fail to help make customers aware of the limitations.
We are also tracking a couple of bugs related to the scenario: (1) If a KeyVault and a secret appear in a template, and both are removed from the stack simultaneously, the stack operation will still fail. (2) If a secret has already been deleted, the stack operation will still fail.
As stated at the beginning though -- the goal is that, if the secret exists and can't be deleted by the stack, the stack will fail. This is true for both stack delete and stack update.
Hope that makes sense.
If you are still looking for a workaround, I would suggest doing a multi-phased deployment to first
detach
the secrets and thendelete
the remaining resources you would like to remove from the stack.
In practice, how would one go about doing this? Is it even possible to detatch individual resources (like secrets)?
@hallgeir-osterbo-visma To detach secrets in Powershell, remove them from the template and call the Set-*DeploymentStack cmdlet with none of the -Delete flags (-DeleteAll or -DeleteResources).
From there, if you want to delete other resources, remove those from the template and then call Set- again with -DeleteResources.
The experience should be similar for AzCLI.
That's not very elegant though in a CI/CD scenario. That would also detatch other resources that we would want to delete. Is there any plans to support detatching resources from the stack directly? Something like... az stack detatch --resource-id ... or something crazy like that. Because then we can bake this into the CI/CD by having a "migration" script running before the deployment.
Another option would be to use the following flow:
Would that be a sufficient workaround?
At this time, we don't have any plans for commands to manage resources outside of template deployments.
Another option would be to use the following flow:
- Deploy the updated template using the chosen delete flags
- Wait for failure due to the KV secret that failed to delete
- Re-run the stack with the same template in detach mode
Would that be a sufficient workaround?
At this time, we don't have any plans for commands to manage resources outside of template deployments.
I'll be honest and say it's not optimal. We have usually run the template in delete mode, then deleted the whole deployment stack. It works, but it's awkward. This approach would kind of be similar. We could probably automate it, but if we detect that deletion fails we ALSO need to check if the deletion that failed was key vault secrets (and only that).
Anyhow, we will survive, but it would be great to have an alternative. Maybe that could be a feature request then, to be able to detatch individual resources using Azure CLI or similar?
@hallgeir-osterbo-visma -- I see a couple of issues with the proposal:
For context, this thread is leading us to instead prioritize the longer-term fix which is to introduce an extensibility provider for Key Vault data plane. The conversation for this is only just getting started, so no commitment on starting the work or ETA, but as far as we can tell it is the only way to properly fix this one.
This scenario should improve with our w23 release (which should roll out over the next couple of weeks). With this change, a secret that has been deleted from KV and is removed from the deployment template will no longer persist in the stack. This does still require some manual fidgeting (namely, manually deleting the secret...) but it should help improve the experience until we can treat secrets as extensible resources.
KeyVault secrets are designed to be managed in the data plane; there is no support for deleting secrets via ARM control plane. Given this limitation, we considered having secrets be detached silently -- this would enable the stack PUT to succeed, but may have unexpected behavior for customers who are not familiar with this scenario.
@snarkywolverine :
Describe the bug I have a keyvault with secrets in a stack. The keyvault has softDelete and purgeProtection turned on. When I remove a secret from my stack description, the next deployment takes about 6 minutes (longer) and the secret is reported as could not remove...
Expected behavior I would have expected the secret to be softDeleted so it would remain in the "deleted secrets" section for as long as the retention is configured.
Repro Environment Host OS: Windows Powershell Version: 7.4.0 Data Center : West Europe