Feature Request - AzOps IAM security guidance for least privilege access

Azure / AzOps

AzOps is a PowerShell module which deploys (Push) ARM Resource Templates & Bicep files at all Azure scope levels and exports (Pull) ARM resource hierarchy.

https://aka.ms/AzOps

MIT License

384 stars 163 forks source link

Feature Request - AzOps IAM security guidance for least privilege access #747

Open vegazbabz opened 1 year ago

vegazbabz commented 1 year ago

The security concept around AzOps is questionable. It basically breaks with all of Microsoft's recommendations around least privileges. You have a single pipeline with permissions to manage more or less everything. This includes granting permissions on MG level, e.g. on Platform Identity -> bye bye Domain Controller(s). Where access to DC was highly restricted on-prem and also should be it according to your ALZ architecture if moved to the cloud, the AzOps completely circumvents this unless the approver is from the same DC admin team and can read code to understand the request to approve. The same for other resources and services. It seems like AzOps only serves an operational purpose without security being thought through. The suspicion is only strengthened by AzOps never mentioning security as a factor.

What is best practice and Microsoft recommendations around AzOps and security? How can we ensure that Enterprise Access Model is still valid?

Source:

daltondhcp commented 1 year ago

Thank you for the valuable feedback, @vegazbabz. We recognize that there is a gap in our current documentation beyond 'getting it up and running' with platform-wide Owner permissions and we will make sure to improve this with additional guidance

Given what the reference architecture looks like and how the platform works regarding RBAC inheritance, managing an Enterprise-Scale platform with pipelines (regardless of AzOps or something else is used), comes with some limitations depending on the operating model and the separation of teams/concerns.

Example: A all our policyDefinitions, roleDefinitions as well as platform-wide policyAssignments are created at the intermediate management group level as depicted below, which basically requires Microsoft.Authorization/* permissions and Microsoft.Resources/deployments/* permissions at that scope. Since we have currently have no way of breaking inheritance at a lower scope, the pipeline/principal will have the same permissions at the Identity management group as well - meaning that this is a challenge we need to deal with using additional gates and controls.

Some high-level recommendations below:

First and foremost, assign the 'least privilege' roleDefinitions for their specific platform IaC use cases. As mentioned in the example above, the least privilege built-in roles required to managed the policy/role platform resources would be User Access Administrator and Automation Runbook Operator. Automation Runbook operator happens to be the least privilege built-in role that provides Microsoft.Resources/deployments/* permissions. A custom role can of course also be used. This means we have significantly reduced the risk for tampering with the Tier 0 (Identity components), as we would first have to elevate privileges before being able to do anything with the resources under that scope.
Secondly, we recommend customers to implement code review best practices using branch protection rules and code owners or the ADO equivalent. This means we will be able to have dedicated code owners from for example he IAM team approving anything touching things under the Identity MG/folders.
Lastly, in some scenarios, customers decide to create additional service principals depending on the target deploymentScope of the pipeline - i.e., if the scope is the Intermediate root management group, we use an SPN with permissions at that level, for platform specific operations, an SPN with only permissions at that level, etc.

Hope this helps. We will use this issue to track documentation updates to further address these topics.

vegazbabz commented 1 year ago

Thanks for the recommendations, those are appreciated. I am also happy hear that you will start working on elaborating further on security documentation. As per your example, Azure Policy is not an issue. But granting the pipeline UAA on an intermediate management group level, I could assign myself with VM contributor role, which means I can compromise e.g. ADDC, etc. I understand that you can create various gates for approval, but suddenly, the approval flow is so significant that the purpose of AzOps - that should be more or less frictionless to deploy IaC - has been changed into a stringent process, removing the deployment agility. Furthermore, the IAM team might not understand IaC, which means you have a risk of the approver (from the correct team) not understanding what he/she approves. Of course you can argue that this comes with experience, etc. but the fact is that many IAM teams are not coders and many IAM teams will not delegate the important responsibility to approve, e.g. ADDC access, to a DevOps person. In addition, if the approval is kept in the same department (e.g. DevOps team A), then this team could takeover the ADDC with a couple of people. I know it looks like I am painting a negative picture, I just would like you to cover various security angles and not only focus on operational benefits. I do not think there is any perfect solution for this, as you mentioned, the inherantage of high privileges are the real issue here, but still needed in order to do operations.

How do you view an alternative solution by deploying pipelines for specific use-cases? E.g. a pipeline for Identity usage, one for LZ, one for networking, etc. etc. Just to segregate the work, so you do not have "one pipeline to rule them all".

daltondhcp commented 1 year ago

@vegazbabz, thank you for the additional input - and apologies for the wall of text 😄 I am happy to hear that you recognize that this is a potential challenge/risk regardless of how the platform is managed due to how inheritance works, i.e. not a unique problem to AzOps.

Your insider privilege escalation scenario is absolutely a potential risk, so is the potential risk of someone in the platform team rolls out a deployIfNotExists policy that runs certain commands on targeted virtual machines, or a policy that breaks other parts of the environments. This risk is even higher is things would be managed manually through the portal, as we have limited control gates after entitlement has been granted at high management group level.

This is why from a DevOps process perspective, reviews, approvals, and gates are absolutely essential for infrastructure as code to ensure reliable, secure, and compliant platform environments. When/if there is a lack of trust or knowledge among approvers, it is critical that the issue is addressed and resolved. This could involve providing additional training and resources to ensure that all approvers have the necessary knowledge and expertise to make informed decisions. Alternatively, it may require building trust through open communication and transparency in the approval process.

It is always recommended to do your own threat modeling to ensure you have adequate preventive, detective and corrective security controls for identified threat scenarios.

Examples:

Insider privileged escalation from UAA -> VM Contributor in the Identity Sub/MG
1. DevOps approval processes (as discussed before)
2. Additional Azure Policies to control roleAssignment behavior at the platform/identity scope to prevent assignment of non-service principal identities or something similar. (https://github.com/Azure/Community-Policy/tree/master/Policies/Authorization)
3. Additional Azure policies to deny 'run command', 'run scripts' operations on the virtual machines to minimize what can be done from the Azure Control Plane.
4. Detective controls (monitoring/alerting) for roleAssignments or suspicious CRUD activity on the DCs
Elevation/escalation of privileges to Azure from Azure AD as a Global Administrator (https://learn.microsoft.com/en-us/azure/role-based-access-control/elevate-access-global-admin)
1. Detective controls with monitoring and potential automatic remediation

As mentioned in my previous comment, customers often do not operate one pipeline to rule them all, but have one AzOps for the core platform resources (Policies, Roles, Management Groups), and then additional repos and pipelines as per operating model such as networking, identity etc. However, since everything lives under the same management group structure, the platform pipelines inevitable will have permissions that can impact other platform components (including Identity).

Entertaining the idea of breaking out the Identity management groups outside of the main management group hierarchy could potentially address some of these concerns, but would come at the cost of additional management overhead and less holistic security posture.

vegazbabz commented 1 year ago

I appreciate the thorough answer, so no need to apologize. The example you provided is great. I completely forgot that you can actually block Microsoft.Authorization/roleAssignments/roleDefinitionId. For iii. it seems like a good idea, however, I have been unsuccessful in creating an actual deny policy for this. Using runcommand. So if you have something useful, please feel free to share 😃

"if": { "allOf": [ { "field": "type", "equals": "Microsoft.Compute/virtualMachines/runCommands" }, { "anyOf": [ { "field": "Microsoft.Compute/virtualMachines/runCommands/source.commandId", "equals": "RunPowerShellScript" }, { "field": "Microsoft.Compute/virtualMachines/runCommands/source.script", "contains": "Invoke-WebRequest" } ] } ] },

https://feedback.azure.com/d365community/idea/d0f837fd-90ad-ed11-a81b-6045bd79fc6e

Do you have any recommendations around the amount of steps and people that should maximum be involved in the reviews, approvals, and gates? I am thinking that you can keep adding complexity to this, but the more complexity the less efficiency, so you want to aim for a good balance.

"However, since everything lives under the same management group structure, the platform pipelines inevitable will have permissions that can impact other platform components (including Identity)." Not sure I understand or agree. Wouldn't the purpose of having multiple pipelines be that the service principal would have different permissions, so you can limit/do permission granularity between the pipelines? So that you would need to breach multiple pipelines (and the gates) in order to do severe damage.

We want to stick with ESLZ architecture, so no moving MG outside (we already looked at this design option, and it is bad).

Once again, thanks for the thorough answers!

daltondhcp commented 1 year ago

For iii. it seems like a good idea, however, I have been unsuccessful in creating an actual deny policy for this. Using runcommand. So if you have something useful, please feel free to share 😃

"if": { "allOf": [ { "field": "type", "equals": "Microsoft.Compute/virtualMachines/runCommands" }, { "anyOf": [ { "field": "Microsoft.Compute/virtualMachines/runCommands/source.commandId", "equals": "RunPowerShellScript" }, { "field": "Microsoft.Compute/virtualMachines/runCommands/source.script", "contains": "Invoke-WebRequest" } ] } ] },

Currently, there is a bit of a gap here in terms of the portal experience for runCommands. Azure Policies only work on PUT requests, and the portal runCommand uses an old API-version and POST.

If you want to validate your policy, try using the PUT API or an ARM/Bicep template (https://learn.microsoft.com/en-us/rest/api/compute/virtual-machine-run-commands/create-or-update?tabs=HTTP)

This means you currently have to use both policy as well as detective controls (alerts) to be fully covered.

Another note around runCommands based on previous experiences, I'd strongly suggest avoiding blocking certain commands etc, as it is usually very easy to circumvent with aliases or alternate commands. Instead of Invoke-WebRequest, I could use curl, Invoke-RestMethod or the aliases.

Do you have any recommendations around the amount of steps and people that should maximum be involved in the reviews, approvals, and gates? I am thinking that you can keep adding complexity to this, but the more complexity the less efficiency, so you want to aim for a good balance.

This all comes down to criticality and risks involved with the different pipelines and changes. The higher potential blast radius, the stricter review process required. I.e. for a change at the root/intermediate root, you perhaps want to enforce two review approvals and potentially additional evidence. For changes with lower blast radius risks such as subscription or rg level, one approval or automated approval might be sufficient.

"However, since everything lives under the same management group structure, the platform pipelines inevitable will have permissions that can impact other platform components (including Identity)." Not sure I understand or agree. Wouldn't the purpose of having multiple pipelines be that the service principal would have different permissions, so you can limit/do permission granularity between the pipelines? So that you would need to breach multiple pipelines (and the gates) in order to do severe damage.

100%, that would be the intent with multiple pipelines. However - to my earlier point, the pipeline managing policyDefinitions, policyAssignments, roleDefinitions etc. at the intermediate root management group, will due to inheritance have permissions across the whole environment.

vegazbabz commented 1 year ago

"Azure Policies only work on PUT requests, and the portal runCommand uses an old API-version and POST."

Alright, thanks for clarify. Although, it is not very good news that the portal is running an old API version(?!).

I tried it via PostMan and it worked:

With this simple policyRule:

    "policyRule": {
      "if": {
        "field": "type",
        "equals": "Microsoft.Compute/virtualMachines/runCommands"
      },
      "then": {
        "effect": "[parameters('effect')]"
      }
    }

So actually, an attack would have to use the portal instead of PowerShell or other thirdparty CLIs, due to the old API version? 😆 that is a bit tragicomedic. Once again, thank you for clarifying and helping with guidelines. I do not have further to add.

daltondhcp commented 1 year ago

I will notify our internal teams about this deficiency so we can fix the portal experience as well.

jsandquist commented 1 year ago

@daltondhcp @vegazbabz Thank you both for the valuable discussion.

I'm a bit late but regarding one of the high-level recommendations:

Secondly, we recommend customers to implement code review best practices using branch protection rules and code owners or the ADO equivalent. This means we will be able to have dedicated code owners from for example he IAM team approving anything touching things under the Identity MG/folders.

Isn't it so that one challenge here is that you can effectively place an arbitrary bicep file just about anywhere in the folder structure, thus triggering a code review from a completely different code owner? The WhatIf output - if available and without too much noise - could then be used to possibly trigger other code owners to be automatically added as additional reviewers or automatically reject it.

daltondhcp commented 1 year ago

@daltondhcp @vegazbabz Thank you both for the valuable discussion.

I'm a bit late but regarding one of the high-level recommendations:

Secondly, we recommend customers to implement code review best practices using branch protection rules and code owners or the ADO equivalent. This means we will be able to have dedicated code owners from for example he IAM team approving anything touching things under the Identity MG/folders.

Isn't it so that one challenge here is that you can effectively place an arbitrary bicep file just about anywhere in the folder structure, thus triggering a code review from a completely different code owner? The WhatIf output - if available and without too much noise - could then be used to possibly trigger other code owners to be automatically added as additional reviewers or automatically reject it.

Not necessarily if we set up different code owners or reviewer policies for different folders in the structure. Key thing is that one needs to be comfortable with the approval structure as well as approvers.