Azure / terraform-azurerm-caf-enterprise-scale

Azure landing zones Terraform module
https://aka.ms/alz/tf
MIT License
827 stars 541 forks source link

ES-Deploy-Nsg-FlowLogs remediation task fails #30

Closed pauldotyu closed 3 years ago

pauldotyu commented 3 years ago

When assigning the ES-Deploy-Nsg-FlowLogs policy to the Azure platform, the policy evaluation accurately reports non-compliant status for NSGs without flow logs configured.

The policy definition for policy_definition_es_deploy_nsg_flowlogs.json has a deployIfNotExists effect and configured to use a system assigned managed identity.

However, the remediation task fails since there is no name for the Microsoft.Network/networkWatchers/flowLogs resource in the policy definition template.

Here is the actual error message that can be found in Azure Activity logs for the deployIfNotExists Policy action:

The 'ifNotExists' target resource type 'Microsoft.Network/networkWatchers/flowLogs' and name '' are not valid in policy assignment '/providers/Microsoft.Management/managementGroups/cu/providers/Microsoft.Authorization/policyAssignments/CU-Deploy-NSG-FlowLogs' and definition '/providers/Microsoft.Management/managementGroups/cu/providers/Microsoft.Authorization/policyDefinitions/ES-Deploy-Nsg-FlowLogs' when evaluating a resource of type 'microsoft.network/networksecuritygroups'.

The built-in policy to deploy NSG flow logs that is similar to this does include a name property for the flow log (see line 62 in the file below)

https://github.com/Azure/azure-policy/blob/master/built-in-policies/policyDefinitions/Network/NetworkSecurityGroup_FlowLog_Deploy.json

Once you add the following line to the policy_definition_es_deploy_nsg_flowlogs.json file, the remediation task works as expected.

"name": "[if(empty(coalesce(field('Microsoft.Network/networkSecurityGroups/flowLogs[*].id'))), 'null/null', concat(split(first(field('Microsoft.Network/networkSecurityGroups/flowLogs[*].id')), '/')[8], '/', split(first(field('Microsoft.Network/networkSecurityGroups/flowLogs[*].id')), '/')[10]))]",
krowlandson commented 3 years ago

Thank you @pauldotyu. Will investigate and add to next release.

WilliamDahlen commented 3 years ago

@pauldotyu On what line number should your fix be added?

WilliamDahlen commented 3 years ago

@krowlandson Maybe you know where it should be added. We are experiencing similar errors with our FlowLogs deployment policy. Error from the complience detail blade is No related resources match the effect details in the policy definition. (Error code: ResourceNotFound), so i would expect its the same problem.

pauldotyu commented 3 years ago

@WilliamDahlen I just realized I added the wrong bit of code in my comment above. I have edited it include the proper "name". You'll want to add this bit to line #55 inside of the "effect" details of the Microsoft.Network/networkWatchers/flowLogs resource deployment

"name": "[if(empty(coalesce(field('Microsoft.Network/networkSecurityGroups/flowLogs[*].id'))), 'null/null', concat(split(first(field('Microsoft.Network/networkSecurityGroups/flowLogs[*].id')), '/')[8], '/', split(first(field('Microsoft.Network/networkSecurityGroups/flowLogs[*].id')), '/')[10]))]",
WilliamDahlen commented 3 years ago

@pauldotyu Thanks! Adding that line almost did the trick. I also had to add the ES-Deploy-Nsg-FlowLogs app as contributer to the management subscription where the sentral loganalytics is stored.

{ "status": "Failed", "error": { "code": "AuthorizationFailed", "message": "The client '' with object id '' does not have authorization to perform action 'Microsoft.OperationalInsights/workspaces/read' over scope '/subscriptions/SUB-ID/resourcegroups/rg-demo/providers/Microsoft.OperationalInsights/workspaces/log-demo' or the scope is invalid. If access was recently granted, please refresh your credentials." } }

This might be coupled to how our policy structure is setup. The ES-Deploy-Nsg-FlowLogs policy is scoped directly at one of the managementgroups in LandingZones, while the loganalytics where the data is going to be stored is located in the managementgroup structure 'Platform -> Management'

Any thoughts on how i can make the policy able to read the loganalytics without changing the scope? Having to add the app as contributer manually is not preferred.

pauldotyu commented 3 years ago

@WilliamDahlen - What scope are your policy definitions being defined at? Ideally the policy definitions should all be made at the "customer root" management group and policy assignments can happen at lower scopes. Also, which version of the module are you using? If you are using v0.0.8, the policy assignment should create system assigned managed identities and make the appropriate role assignment to be about to carry out remediation tasks.

WilliamDahlen commented 3 years ago

@pauldotyu - The policy definition is included in the archetype_definition_esroot.tmpl.json in our local lib folder. The assignment and the parameters are defined at the archtype_config.parameters level within our landingzones in the custom_landing_zones block. The policy definition itself were copied to the local lib folder "policy_definitions" and in that file we added the missing string. We are indeed using v0.0.8 of the module, ill try to recreate both the assignment and the definition and their roleassignments and see if it happens again.

johankardell commented 3 years ago

I just encountered the same problem and the suggested solution solved the problem for me as well.

krowlandson commented 3 years ago

Closing this issue as we believe this is now resolved.