Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
87 stars 6 forks source link

Feature request - Allow the user to exclude referenced resources from the stack (ARM reference and bicep existing) #132

Closed GABRIELNGBTUC closed 5 months ago

GABRIELNGBTUC commented 8 months ago

Is your feature request related to a problem? Please describe. When deploying stacks using resource group scoped deployments using shared resources in another resource group, the shared resources are usually used through references to retrieve some values and are never modified directly. Some child resources may be created/modified but the parent resource tends to not change during the deployment.

Deployments stacks still adds these read-only shared resources inside the stack which is a problem because it adds a lot of risks when using DeleteOnDetach. An ACR A that is referenced to create a role assignment for a function could easily be deleted by error if the function now needs to use ACR B instead of ACR A.

Take the following pseudo-template as an example:

module create_func 'modulepath' = {
name: createfunc
}
resource acr_a 'containerregistry' existing = {
name: containera
}
//resource acr_b 'containerregistry' existing = {
//name: containerb
//}
resource role_assignment 'roleassignments' = {
name: createfuncroleassignment
scope: acr_a
properties:
{
roleDefinition: id
principalId: create_func.outputs.principalId
}
}

In this pseudo-template, a function is created and given a role on acr_a. The resource acr_a is never modified, however, it is added to the stack.

Now if I comment acr_a, uncomment the acr_b block and change the role assignment scope, acr_a will be delete when using DeleteOnDetach for no good reason. All you are doing in this template is creating role assignments in your template, deleting the whole acr resource is completely out-of-scope for such use-case.

Describe the solution you'd like Adding a switch to the the stack commands to not include referenced resources in the stack.

Describe alternatives you've considered Adding delete locks not managed by any stack. However, this is extra work to protect against what seems like a huge oversight

alex-frankel commented 8 months ago

@GABRIELNGBTUC - existing resources are not included in the last of managed resources, so we shouldn't be attempting to delete either acr_a or acr_b. Have you experienced a case where we tried to delete an existing resource? If so, that would be a bug we need to get resolved.

GABRIELNGBTUC commented 8 months ago

I just tried the provided pseudo template and it unfortunately didn't reproduce the issue at my surprise.

However, I have a template deployed in a test environment with stacks and it is adding our registry to its managed resources.

The ACR is not modified (only role assignments are created). But when I look at my stack resource, I can see it in the managed resources

You can find the template here https://gist.github.com/GABRIELNGBTUC/2db5602bc930730d716fb58b8a6ab7b7

alex-frankel commented 8 months ago

How old is the stack that is still reproducing the issue with this template? We did have a bug for this (#81), but it has since been fixed.

GABRIELNGBTUC commented 8 months ago

The stack was created the 9th of September and updated yesterday.

I tried to delete and recreate the stack today just in case it would have fixed the issue after reading your comment. But the ACR still ends up being managed by the stack.

GABRIELNGBTUC commented 8 months ago

After more testing, I managed to narrow down the issue to the "symbolicNameCodegen": true in bicepconfi.json.

Using the following module:


param role_name string = 'AcrPull'

@minLength(5)
@maxLength(50)
@description('Name of the container regisry where permissions will be applied to.')
param registry_name string

@description('The Principal ID that will be assigned role')
param principal_id string

@allowed([
  'ServicePrincipal'
  'User'
  'Group'
])
@description('The type of the service principal')
param principal_type string = 'ServicePrincipal'

@description('Dictionary containing the correct scope for each role assignment')
var role_id_mapping = {
  AcrPull: '${subscription().id}/providers/Microsoft.Authorization/roleDefinitions/7f951dda-4ed3-4680-a7ca-43fe172d538d'}

resource registry_resource 'Microsoft.ContainerRegistry/registries@2023-07-01' existing = {
  name: registry_name
}

var role_guid = guid(registry_resource.name, role_name, principal_id)

resource roleassigment_resource 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: role_guid
  scope: registry_resource
  properties: {
    principalId: principal_id
    principalType: principal_type
    roleDefinitionId: role_id_mapping[role_name]
    description: 'Bicep: Assign role ${role_name} as type ${principal_type}  to resource ${registry_resource.name} with guid ${role_guid}'
  }
}

I created a bicepconfig.json file at the same level of that module.

When I set "symbolicNameCodegen": false and use that module, the ACR resource is correctly unmanaged.

When I set "symbolicNameCodegen": true, the ACR appears in the deployments representing this module and is added to the stack.

image

azcloudfarmer commented 8 months ago

Adding work item reference: https://msazure.visualstudio.com/One/_workitems/edit/25780623

azcloudfarmer commented 7 months ago

Hi @GABRIELNGBTUC - just wanted to share we are still working on this investigation and discussing approaches for the fix. @snarkywolverine, @dantedallag and I will keep you posted on this thread.

majastrz commented 7 months ago

@GABRIELNGBTUC We have identified the root cause and are working on getting the fix checked in and deployed.

azcloudfarmer commented 7 months ago

Hi @GABRIELNGBTUC - quick update, the fix is still in the process of getting deployed. We will confirm here once fully available.

Also adding work item reference: https://msazure.visualstudio.com/One/_workitems/edit/25780623

RaduG commented 6 months ago

Hi @azcloudfarmer do you have any updates related to this issue?

snarkywolverine commented 6 months ago

I would expect these changes to be rolled out by mid January.

peter-de-wit commented 6 months ago

I also ran into this. I use deployment stacks as a parent (root) deployment with additional child deployment stack features. When refering to resources within the parent stack, and trying to delete the child stack, it is trying to delete resources from the root stack. But, due the 'cannot delete' functionality within the root stack, this deletion is not completed, luckely. But this does bring up the problem that deleting the child stack is only possible by setting deletion mode on 'deattach' .

A more detailed scenario:

main stack: deploys an automation account.

child stack: deploys variables within the automation account.

Deleting the child stack (with deleteresources) results in an attempt to delete the automation account also.

azcloudfarmer commented 5 months ago

Hello @peter-de-wit @RaduG @GABRIELNGBTUC - quick update on this issue. The fix is still rolling out and we expect to see it live now close to end of January.

azcloudfarmer commented 5 months ago

Hello @peter-de-wit @RaduG @GABRIELNGBTUC - quick update on this issue. The fix has been deployed to all regions with the exception of "North Europe" and "South Central US".

GABRIELNGBTUC commented 5 months ago

I reran some stacks using the language version v2 and existing statements. The previously referenced resources have been successfuly removed from the stack as they should.

Thanks for the help.