Azure / Community-Policy

This repo is for Microsoft Azure customers and Microsoft teams to collaborate in making custom policies.
MIT License
615 stars 322 forks source link

deploy-storage-monitoring-log-analytics reports incorrect compliancy and non-compliancy #188

Closed jeroenwo closed 1 year ago

jeroenwo commented 3 years ago

Before creating the deploy-storage-monitoring-log-analytics policy, I ran JimGBritt's policies for diagnostic settings to LogAnalytics.

I ran the following command:

$definition = New-AzPolicyDefinition `
    -Name "deploy-storage-monitoring-log-analytics" `
    -DisplayName "Deploy Diagnostic Settings for Azure Storage, including blobs, files, tables, and queues to a Log Analytics workspace" `
    -Description "Deploys the diagnostic settings for Azure Storage, including blobs, files, tables, and queues to stream to a regional Log Analytics workspace when any Azure Storage which is missing this diagnostic settings is created or updated." `
    -Policy 'https://raw.githubusercontent.com/Azure/Community-Policy/master/Policies/Storage/deploy-storage-monitoring-log-analytics/azurepolicy.rules.json' `
    -Parameter 'https://raw.githubusercontent.com/Azure/Community-Policy/master/Policies/Storage/deploy-storage-monitoring-log-analytics/azurepolicy.parameters.json' -Mode Indexed

$definition

$assignment = New-AzPolicyAssignment -Name "Diagnostic Settings for Azure Storage PROD" -Scope "/subscriptions/<tenantid>" `
    -Description "Deploys the diagnostic settings for Azure Storage, including blobs, files, tables, and queues to stream to a regional Log Analytics workspace when any Azure Storage which is missing this diagnostic settings is created or updated." `
    -logAnalytics "/subscriptions/<tenantid>/resourcegroups/<resourcegroup>/providers/microsoft.operationalinsights/workspaces/<law>" `
    -profileName "MonitoringDiagnosticsSettings" -metricsEnabled "True" `
    -PolicyDefinition $definition -Location "westeurope" -AssignIdentity

$assignment

In the Azure Portal I edited the policy (to exclude some resource groups and assign the proper RBAC to the assigned identity), and triggered a policy evalutation.

The result is that there is a list of compliant resources: image

However, if I check the diagnostic settings, they are not as expected (no blob, queue, table and file diagnostics enabled): image

The non-compliant resources do have the proper setting: image image

How come these settings are incorrect? Creating a remediation task fails as the settings are already present, and the "compliant" resources cannot be configured correctly.

jeroenwo commented 3 years ago

After some investigation I found out that it was the name of the diagnostic profile name that is checked if a resource is compliant or not.

If I change the profile name to something unique, all resources are marked non-compliant. However, when I try to remediate, I get the following error:

{ "code": "DeploymentFailed", "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.", "details": [{ "code": "Conflict", "message": "Data sink '/subscriptions//resourcegroups/ar-p-rg-operations/providers/microsoft.operationalinsights/workspaces/' is already used in diagnostic setting 'MonitoringDiagnosticsSettings' for category 'Capacity'. Data sinks can't be reused in different settings on the same category for the same resource." } ] }

So renaming it doesn't remove the existing storage account diagnostic settings, but it also doesn't create a new diagnostic setting for blob, queue, table and file storage).

Any idea how to resolve this issue?

mrajess commented 3 years ago

By design you cannot have data go to the same sink across multiple diag settings. You can't have two diag settings going to the same LA workspace (or event hub or storage) for the same category of event. Maybe you could try changing the existence condition to just check for any diag setting with an appropriate config. That way if there's a diag setting with a different name that meets the criteria we just move along.

"type": "Microsoft.Insights/diagnosticSettings", "existenceCondition": { "allOf": [ { "field": "Microsoft.Insights/diagnosticSettings/logs.enabled", "equals": "[parameters('logsEnabled')]" }, { "field": "Microsoft.Insights/diagnosticSettings/metrics.enabled", "equals": "[parameters('metricsEnabled')]" }, { "field": "Microsoft.Insights/diagnosticSettings/workspaceId", "equals": "[parameters('workspaceID')]" } ] }

However, this doesn't fully solve the issue. The root problem as mentioned is that only one diag profile can send a log/metric category to a particular data sink (LA workspace). Honestly, the scenario you find yourself in is possible, with a lot of shenanigans, to sort out with Policy, but I wouldn't advise it.

Instead I might just build some quick automation via PowerShell or something to remove the previous diagnostic settings you had created via Jim Britts script. Then I would enhance the existing Policy to also check for workspaceId, then I'd make a Policy with a Deny effect that says only diag settings with X name can be configured to use X workspace ID.

Soon there will be an easier way to enable logging at scale. I think you might start seeing more about this some time in July. Don't hold me to that, but do know that the current difficulty of enabling logging at scale is felt here at Microsoft and we are working to improve the situation.

nathanfisk commented 2 years ago

@jeroenwo - did you ever find a solution for the incorrect compliance reporting? I am also currently trying to roll-out diag settings to blob services via policy and, while the remediation seems to run perfectly fine and enable diag settings as desired, evaluation still seems to report the resources as non-compliant.

I'm just wondering if something to do with the nested structure of Storage Accounts and their component services/containers. i.e. is it reporting that the parent Storage Account is not compliant (despite the fact that SA's have no logging component - they only have metrics).

@mrajess - was there ever an update in the summer about this, like you mentioned above?

jesseloudon commented 2 years ago

If running into incorrect compliance reporting when managing diagnostic settings for Storage Accounts to Log Analytics Workspaces you can try ensuring that in your ARM template the following 4 resource types have the Transaction metric enabled property set to true.

JSON / ARM Template example for "Microsoft.Storage/storageAccounts/blobServices/providers/diagnosticSettings"

"resources" : [
{
  "condition" : "[contains(parameters('servicesToDeploy'), 'blobServices')]",
  "type" : "Microsoft.Storage/storageAccounts/blobServices/providers/diagnosticSettings",
  "apiVersion" : "2017-05-01-preview",
  "name" : "[concat(replace(parameters('resourceName'),'/default',''), '/default/', 'Microsoft.Insights/', parameters('diagnosticsSettingNameToUse'))]",
  "location" : "[parameters('location')]",
  "dependsOn" : [],
  "properties" : {
    "workspaceId" : "[parameters('logAnalytics')]",
    "metrics" : [
      {
        "category" : "Transaction",
        "enabled" : "[parameters('Transaction')]",
        "retentionPolicy" : {
          "days" : "[parameters('TransactionRetentionDays')]",
          "enabled" : false
        },
        "timeGrain" : null
      }
    ],
    "logs" : [
      {
        "category" : "StorageRead",
        "enabled" : "[parameters('StorageRead')]"
      },
      {
        "category" : "StorageWrite",
        "enabled" : "[parameters('StorageWrite')]"
      },
      {
        "category" : "StorageDelete",
        "enabled" : "[parameters('StorageDelete')]"
      }
    ]
  }
},
jeroenwo commented 2 years ago

@nathanfisk If I recall correctly I removed all the existing LA diagnostic settings and added them again with a new name. But it has been some time and I am not working with policies anymore, sorry I can't give you any more conclusive information.

techlake commented 1 year ago

Cleaning up old issues (closing)