Azure / missionlz

Azure landing zone for SCCA-compliant organizations.
MIT License
235 stars 133 forks source link

Spike: Investigate using resource locks #305

Closed brooke-hamilton closed 2 years ago

brooke-hamilton commented 3 years ago

Benefit/Result/Outcome
So that resources and/or resource groups that are deployed by MLZ are locked to prevent accidental deletes.

Description A common recommendation for production cloud deployments is to put resource locks on resource groups and individual resources. The purpose of this work is to determine if that's feasible to include with an MLZ deployment.

Acceptance Criteria

brooke-hamilton commented 3 years ago

Update: added needs triage label because of demand signal to increase the priority of this issue.

lisamurphy-msft commented 2 years ago

Yes, it appears to be feasible to deploy resource locks with MLZ, both in Bicep and Terraform implementations.

lisamurphy-msft commented 2 years ago

Gathering data on a correct path forward for proposed implementation.

lisamurphy-msft commented 2 years ago

Provided an appropriate Azure Role Based Access Control (RBAC) schema is in place to restrict access to modifying or deleting resources, it may still be necessary to provide an additional tier of protection. Some customers may be required to keep data in perpetuity and deletion of resources, even if unintentional, can justify having a secondary line of defense to prevent unintentional deletion. This can be achieved with Azure Resource Locking where management locks can apply to a subscription, resource group, or a resource.

Determining the appropriate level of resource locking will be a design decision, but this can provisionally be set up fairly succinctly through Bicep, PowerShell, Azure CLI, or the REST API. Resources that should ideally be locked would be Storage Accounts in MissionLZ. There might also be a potential use case for wanting to put resource locks on the firewall settings to prevent someone nefariously adding a route that would not be within security guidance for the system integrator. Unfortunately this ticket is a bit broad in scope and the amount of problems this can cause the team by implementing resource locks, then manually removing said locks, then testing again is not an ideal state. I would not be in favor of adding Resource Locks to the MissionLZ baseline deployment.

Effectively, as per @shawngib implementing Azure Lockbox to manage RBAC and access to data/resources is probably the correct call here. This is documented here where a security baseline might need to be reviewed to ensure that we are holding with security compliance concerns.

Alternately, if a system integrator wants to implement resource locks on their resources, they are more than able to do so. We need to determine if we should simply provide a utility within MissionLZ for a system integrator to implement resource locks if they want or need to do so and aren't confident in their existing RBAC permissions. This can ideally be done similarly to how the example use case was provided for guidance on inheriting tags in #440

sstjean commented 2 years ago

Just wanted to add some additional context based on the original customer request.

When deploying a Tier 3 network, a VNet is created with specific subnets as configured by the deployment/operations team. The Tier 3 subscription can them be "handed over" to the workload owner to deploy necessary resources which will attach to the created subnets. The forced-tunneling routes are configured in the RG created by MLZ and should be controlled by the Ops team and not the Tier 3 workload owner. Giving the workload owner OWNER permissions to the Tier 3 sub now allows the Tier 3 owner the ability to change the forced tunneling or CIDR ranges outside of the operations team's control. By putting a READ lock on the networking resources, the Tier 3 owner can attach new resources to the subnet but cannot change the subnet or routing configuration.

lisamurphy-msft commented 2 years ago

Interesting use case, will generate a backlog item to provide a template for applying this to a resource specifically as opposed to a resource group. But as a T3 operator, getting portal access is needed? Other problems could be encountered by the account owner if the T3 operator were to add resources that were not explicitly approved. This can potentially lead to an issue with cost also. I will go ahead and create a product backlog item for providing a utility to aid in the locking of the resource group: #484