Azure / Enterprise-Scale

The Azure Landing Zones (Enterprise-Scale) architecture provides prescriptive guidance coupled with Azure best practices, and it follows design principles across the critical design areas for organizations to define their Azure architecture
https://aka.ms/alz
MIT License
1.68k stars 952 forks source link

VM Compliance: Windows machines should meet requirements of the Azure compute security baseline #1466

Closed jtracey93 closed 10 months ago

jtracey93 commented 10 months ago

Discussed in https://github.com/Azure/Enterprise-Scale/discussions/1465

Originally posted by **integyjc** November 1, 2023 Hi All, im posting this here as i'm not sure if its an issue or something we are doing wrong! We have three separate ALZ environments (different tenants), following ALZ deployment and creation of a VM within the ALZ, all end up with the following policy as non compliant **Initiative:** [Enforce Azure Compute Security Benchmark compliance auditing](https://portal.azure.com/#view/Microsoft_Azure_Policy/InitiativeDetail.ReactView/id/%2Fproviders%2Fmicrosoft.management%2Fmanagementgroups%2Falz%2Fproviders%2Fmicrosoft.authorization%2Fpolicysetdefinitions%2Fenforce-acsb/scopes~/%5B%22%2Fproviders%2FMicrosoft.Management%2FmanagementGroups%2Falz%22%5D/isDefinitionEdited~/false) **Policy**: [Windows machines should meet requirements of the Azure compute security baseline](https://portal.azure.com/#view/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2F72650e9f-97bc-4b2a-ab5f-9781a9fcecbc) It seems this policy is looking for VMs to have the AzureWindowsBaseline Guest Assignment applied to them which they do not and the policy reports non compliant. I can't see anything in ALZDefault policies which is setup to configure this guest assignment. Is this something people are handling separately to ALZ or are we expecting VMs to get the AzureWindowsBaseline guest assignment from an ALZ or DFC policy? Any help much appreciated!
jtracey93 commented 10 months ago

@springstone can we take a look at this one please?

Springstone commented 10 months ago

Hi @integyjc, that built in policy https://www.azadvertizer.net/azpolicyadvertizer/72650e9f-97bc-4b2a-ab5f-9781a9fcecbc.html provides guidance on what is required for it to work correctly (https://aka.ms/gcpol). Basically, you need to deploy and assign "Azure Automanage Machine Configuration" (which is included in both the default "Azure best practices" configuration profiles) which requires the Automanage (formerly Guest Configuration) agent.

We do not currently deploy and configure Automanage as part of ALZ due to the many combinations of profile configuration (for custom) and determining the scope to apply a profile to (Production/DevTest). For Azure Security Baseline, for example, you may choose to audit only, apply once and monitor, or apply and autocorrect drift. image

I'll raise this with the ALZ Leads for further discussion and potential inclusion in ALZ.

Let me know if you need any further guidance.

Springstone commented 10 months ago

Just a quick update, I'm digging into this a bit more, as we are deploying the guest configuration agent, but I suspect we're not assigning the Azure Security Baseline configuration profile. Will provide updates here.

integyjc commented 10 months ago

Thanks @Springstone we got to the same conclusion yesterday although are hesitant at deploying the default Automanage configuration profies due to the enabling of other features such as backups etc, in a way we'd rather control elsewhere.

It would be interesting to hear the outcome of your discussions and research and what you determine is the best method of simply applying the baselines via Automanage

Springstone commented 10 months ago

Update @integyjc. The initiative we deploy does actually work as expected (it's been about a year since I looked at this :)) but it isn't very intuitive to say the least. Let's leave Automanage out of this conversation, and let me try clarify: the "non-compliant" status actually means it's not compliant with the Azure Compute Security Baseline, and I'll paint the picture to make sure we're talking about the same issue:

You see this: image This actually means that those resources are not compliant with the Azure Compute Security Baseline. To get the details, click on "Details" next to one of the VMs that are not compliant and you get: image

You then need to follow the "Click Here" link to get to the Azure Compute Security Baseline assessment: image Which shows me the 95 non-compliant settings for my newly deployed Azure virtual machine (Windows Server 2019).

This is why it shows as non-compliant. You can of course now use the same Guest Configuration agent to remediate your machines to improve compliance, but that isn't in ALZ scope/purview. To understand more about guest configuration assignments please review : https://learn.microsoft.com/en-us/azure/governance/machine-configuration/assignments. Also look under the Quickstarts section for some ideas on how to use guest configuration to your benefit.

I hope this makes sense and addresses your query. Please let me know either way.

integyjc commented 10 months ago

Hi @Springstone thank you again, I appreciate the detail you've gone into. The exact issue we've found across three fresh ALZ deployments is that the policy mentioned is non compliant due to the security baseline not existing in the environment. Here is a live example:

image

image

Our confusion at the moment is whether something in ALZ should be creating the AzureWindowsBaseline guest assignment for new virtual machines which the policy can then use to report compliance or should something else such as DFC be creating these.

Springstone commented 10 months ago

This policy does that https://www.azadvertizer.net/azpolicyadvertizer/72650e9f-97bc-4b2a-ab5f-9781a9fcecbc.html (which is part of the ALZ intiative https://www.azadvertizer.net/azpolicyinitiativesadvertizer/Enforce-ACSB.html) as part of the then clause (checks if it's not compliant). Is the guest configuration extension deployed on your VMs? image

integyjc commented 10 months ago

Hi, yep all VM's have that extension, across the three different tenants:

image

image

From what I can tell the policies just install the Guest extension itself but I cannot see which policy or configuration is supposed to create and assign the AzureWindowsBaseline guest assignment itself

Springstone commented 10 months ago

That top policy creates the guest assessment (done through the policy metadata) explained https://learn.microsoft.com/en-us/azure/governance/machine-configuration/assignments. Do you have any egress restrictions in place? (see https://learn.microsoft.com/en-us/azure/governance/machine-configuration/overview) - default routes to firewall perhaps?

integyjc commented 10 months ago

@Springstone thank you I think you're onto something here. All of these environments have Azure Firewall with default routes out. They do also have outbound http and https enabled via an app rule so this should still work although I think something here is the common cause as to why the guest assignments via the mentioned policy are so unreliable

One example is where Azure firewall for some reason cannot resolve the guestconfiguration address despite being able to resolve other DNS entries!

image

It would be interesting to know if you've seen this before but I am viewing this now as an issue to troubleshoot and appreciate we're probably outside of an ALZ specific issue / bug

integyjc commented 10 months ago

@Springstone - Ok i've now got to the bottom of this. This is happening across ALZ deployments where we have azure firewall in a hub vnet with the private dns zones linked as performed as part of an ALZ deployment with hub and spoke

It seems that despite us not wanting the guest configuration service to use private link, having the privatelink.guestconfiguration.azure.com DNS zone linked to the hub, is causing Azure firewall to fail to resolve agentserviceapi.guestconfiguration.azure.com. I'm now questioning my understanding of Private link and private dns! but all behaviour i've seen for other services such as blob would cause the public endpoint to be resolved if a record did not exist in the private zone, but in this case resolution fails all together

The following error is seen on a windows client (Azure VM) within the gc_agent log

Status Code '400'. Error Message 'Failed to issue request : uri: https://agentserviceapi.guestconfiguration.azure.com/virtualMachines/f8b99726-a6c9-47b2-bd62-22fc8e192181/metadata?api-version=2018-06-30// with error: WinHttpSendRequest: 12029: A connection with the server could not be established

The following corresponding error is shown in Azure firewall

image

If I unlink the guestconfiguration private dns zone from the hub vnet, resolution works and the guest assignment appears following a restart of the Guest Configuration Service!

Springstone commented 10 months ago

@intgyjc Good to hear you've found the issue and a workaround. We'll investigate the issue with our networking specialists, and let you know our findings.

Springstone commented 10 months ago

Hi @integyjc. The behavior is as expected, the DNS Zone for guest configuration has no entries in it, so it will reply with NXDOMAIN (as this is the authoritative DNS provider for that resource). The way to enable this for guest configuration is based on a VM tag documented here: https://learn.microsoft.com/en-us/azure/governance/machine-configuration/overview#communicate-over-private-link-in-azure The alternative is to unlink/delete the Private DNS Zone, as you've done, if you don't require communication via Private Link.

We'll be updating the ALZ FAQ with clearer guidance in line with this. Closing as there isn't any further action required to unblock you. Feel free to reply or reopen if you have any questions.

integyjc commented 10 months ago

Hi @Springstone I appreciate your efforts in helping this come to a conclusion and apologies for dragging you down this rabbit hole with us! - I can confirm that adding the recommended VM tag to a VM in our test tenant does resolve the issue with the vnet link in place for privatelink.guestconfiguration.azure.com. We will apply a policy at the ALZ level to apply this tag to VM's.

I hope this thread or some updated FAQ guidance proves useful to others as we did go round in circles a bit with this one and found overlap with Azure ARC specific documentation which didn't appear relevant to this issue, although very much was!

msundman78 commented 4 months ago

I'm struggling with the same problem that this policy reports my newly deployed Win2022 servers as non-compliant. What is the recommended approach to deploy the Azure Compute Security Baseline on Azure VMs in regions like Sweden where Automanage is currently not available? Is there any up-to-date DSC's available to deploy? I've only found outdated 2016/2022 versions when googling myself.