Azure / Enterprise-Scale

The Azure Landing Zones (Enterprise-Scale) architecture provides prescriptive guidance coupled with Azure best practices, and it follows design principles across the critical design areas for organizations to define their Azure architecture
https://aka.ms/alz
MIT License
1.73k stars 980 forks source link

Bug Report - DINE for SQL Audit #642

Closed pkorolo closed 1 year ago

pkorolo commented 3 years ago

Describe the bug

  1. "Deploy-SQL-Audit" Policy is Audit (AINE), instead of Deploy (DINE) ("id": "/providers/Microsoft.Authorization/policyDefinitions/a6fb4358-5bf4-4ad7-ba82-2cd2f41ce5e9").

As result, ALL Azure SQL Servers (PaaS) end up non-Compliant, because the Policy just Audits (and does NOT turn ON) SQL Server Auditing.

  1. Policy "Deploy Diagnostic Settings for SQL Databases to Log Analytics workspace" within Initiative "Deploy-Resource-Diag" tries to enable SQL DB (PaaS) Auditing through a non correct way. In further detail, tmho, the last setting

{ "category": "SQLSecurityAuditEvents", "enabled": "[parameters('logsEnabled')]" }

must be REMOVED, and a new DINE Policy must be created, specifically for SQL DB Auditing enablement, possibly in conjunction with a correct DINE Policy for SQL Server Auditing (see 1 above).

  1. if someone enables SQL Auditing (Server & DB level) manually, the SQL Auditing Solution is loaded within Log Analytics, but for some reason, the latter breaks the "PULL" workflow, with the following:

Set-Content: /home/runner/.local/share/powershell/Modules/AzOps/1.2.0/AzOps.psm1:1055 Line | 1055 | … AsStrings | Set-Content -Path ([WildcardPattern]::Escape($objectFileP … | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | An object at the specified path | /home/runner/work/eslzxyz/eslzxyz/root/eslzxyz | (eslzxyz)/eslzxyz-platform | (eslzxyz-platform)/eslzxyz-management | (eslzxyz-management)/management subscription | (***)/eslzxyz-mgmt/./microsoft.operationsmanagement_solutions-sqlauditing[eslzxyz-la].json does not exist, or has been filtered by the -Include or -Exclude parameter.

[22:32:44][Stop-PSFRunspace] Stopping runspace: psframework.logging [22:32:45][Stop-PSFRunspace] Stopping runspace: psframework.taskengine [22:32:45][Stop-PSFRunspace] Stopping runspace: psframework.logging Error: Process completed with exit code 1.

Steps to reproduce

  1. for 1 & 2, just create a SQL DB in any Landing Zone subscription
  2. for 3, after creating a SQL DB in LZ, just configure SQL Auditing manually, and run manually a Pull workflow.

Screenshots sql-non-compliant sql-server-auditing-policy part-to-be-removed flow-error

mrajess commented 3 years ago

Is this bug report still active? It appears the Policy "Deploy-SQL-Audit" has been removed and the built-in policy that's referenced to deploy diag settings for SQL DB has removed the category you called out. Seems like the bugs have been fixed, but there might need to be a feature enhancement to ensure sql server auditing is being enabled? I'm unsure on that.

pkorolo commented 3 years ago

Please see below:

a. the fact that now the Policy is called "Auditing on SQL server should be enabled", does not alter the situation that this is still AINE (audit) instead of DINE (deploy). In fact, previous and current Policies, are the same built-in one: a6fb4358-5bf4-4ad7-ba82-2cd2f41ce5e9, which is by definition AINE.

b. indeed, the fix into the Diagnostics Initiative, did remove the part that was trying to do enable Auditing through the Diagnostics settings

c. the fact still remains that neither Server-level nor DB-level auditing gets enabled in ESLZ setup, as it should.

By the above, we could leave this open as bug, or make it a feature enhancement, but Auditing at Server & DB-level must be enabled by default

mrajess commented 3 years ago

Excellent details. Thanks! It looks like there is a built-in DINE now to enable auditing on SQL servers: "Configure SQL servers to have auditing enabled"/"f4c68484-132f-41f9-9b6d-3e4b1cb55036".

Let me review the entire change process for deploying SQL auditing and I'll see if I can't put together a PR to address this.

pkorolo commented 3 years ago

The one referred here "f4c68484-132f-41f9-9b6d-3e4b1cb55036" unfortunately targets storage account as output, not LA workspace.

mrajess commented 3 years ago

Easy enough fix.

pkorolo commented 3 years ago

@mrajess :

  1. I will have to partially correct myself (and you), on my (b) statement, a couple of days ago:

eventually the new (built-in) policy (b79fa14e-238a-4c2d-b376-442ce508fc84 for SQL db diagnostics) that is used in the new ESLZ initiative ("Deploy Diagnostic Settings to Azure Services"), has practically the same problem as the old (custom) one, which existed in the old ESLZ initiative ("Deploy-Resource-Diag").

The new one (the built-in), as the old one did, still does have parts for enabling "SQLSecurityAuditEvents", hence the new one, still ends up non-compliant (and the whole ESLZ initiative with it).

The latter you can easily repro, if you create a SQL DB; the new ESLZ Initiative will end up non-compliant, and the same will happen to the built-in policy, if you assign it anyplace by itself (even outside ESLZ), as long as it applies to a SQL db.

Strangely, if you manually remediate, the problem seems to go away for a while (policy / initiative go green), but when policy is re-evaluated after some time, it flips back to non-compliant state.

I strongly believe that the new built-in policy (b79fa14e-238a-4c2d-b376-442ce508fc84) needs review and potential correction.

besides the above, what we have discussed about the SQL Audit policy still stands - as AINE, it will never go green, hence the need of a SQL Server & DB DINE Auditing Policy still remains.

mrajess commented 3 years ago

I just finished a DINE that will deploy audit settings for SQL DB. I'm making a complimentary DINE that checks to make sure the diag setting for auditing is present on the master database. Wrapping that one up now. Once I've completed code I'll review everything else you've shared with me and figure out how best to proceed. All changes can be tracked in this branch.

mrajess commented 3 years ago

I took a moment and looked through the logic of the built-in more closely and I see the issue you're referring to. We could specify that Policy to disable the "SQLSecurityAuditEvents" events from being captured, but then due to the existence condition it would result in that Policy always being non-compliant.

{ "allOf": [ { "field": "Microsoft.Insights/diagnosticSettings/logs.enabled", "equals": "True" }, { "field": "Microsoft.Insights/diagnosticSettings/metrics.enabled", "equals": "True" }, { "field": "Microsoft.Insights/diagnosticSettings/workspaceId", "matchInsensitively": "[parameters('logAnalytics')]" } ] }

The only solution I see moving forward is to break this whole thing out into three Policies.

  1. Deprecate the built-in (until fixed) b79fa14e-238a-4c2d-b376-442ce508fc84 and replace with a new custom Policy that does not enable the category of "SQLSecurityAuditEvents".

  2. Have a Policy that deploys a Diagnostic Setting for the master DB that just enables the following events: "SQLSecurityAuditEvents" and "DevOpsOperationsAudit". That same Policy will enable the "auditingSettings" and "devOpsAuditingSettings" on the SQL server. This Policy would be a DINE that targets the type of "Microsoft.Sql/servers" and looks for this associated resource type: "Microsoft.Sql/servers/auditingSettings"

  3. A Policy that audits for and deploys the aforementioned Diagnostic Setting in the event that a category is disabled or the Setting is deleted. This Policy would be a DINE that targets the type of "Microsoft.Sql/servers/databases" and looks for this associated resource type: "Microsoft.Insights/diagnosticSettings"

@jtracey93 what are your thoughts?

pkorolo commented 3 years ago

do you know where should we escalate the built-in (b79fa14e-238a-4c2d-b376-442ce508fc8) policy issue to be fixed?

jtracey93 commented 3 years ago

Afternoon both,

@pkorolo the built-in, if at fault (this is not clear as to why for me from the conversation above - if you can break it down more simply for me that would be ace 👍), we should take to the SQL DB PG team to highlight to them; but if we take a fix to them this may be best. Let's chat offline about this one, once we know where we stand.

@mrajess & @pkorolo - does the built-in that sends to Event Hub solve the issue here (ID: 9a7c7a7d-49e5-4213-bea8-6a502b6272e0)? I see the metrics and logs resource deployment are quite different to the Log Analytics version.

If we could refine this issue to make it clearer as to the problems/challenges, I think that is a very good next step, as it's not clear to me. Maybe we need to setup a call?

Thanks

Jack

mrajess commented 3 years ago

@jtracey93 I think it's best we setup a call. In order to understand the issue fully you must understand how SQL Audit logs are generated. This ARM template does an excellent job of demonstrating that process.

Basically, there are two categories of audit logs that can be generated. image

You have the type of "Microsoft.Sql/servers/auditingSettings", which is the basic level. Then you have a separate type of "Microsoft.Sql/servers/devOpsAuditingSettings". This last type is optional, but I would strongly recommend as most organizations want to capture this data for compliance reasons.

Once these objects have been updated to be enabled and to emit data to Azure Monitor you then go and create a diagnostic setting on the Master database. This diagnostic setting should enable logs of this type: "SQLSecurityAuditEvents" "DevOpsOperationsAudit"

You can enable logs of this type on the Master DB without the previous audit settings being enabled, but there won't be any data to capture. In short, you can do the above process in any order, so long as it's all done. There isn't benefit to enabling those log types on other databases or on the server itself. In fact, I'm not even sure if those log types can truly be enabled on things that are not the Master DB, but I haven't tested. @pkorolo might have more info there.

This is just meant to be some initial reading so that you better understand HOW this configuration is enabled. I'll now go and look at calendars and get a call setup between us three. I'm sure timezones will be fun.

mrajess commented 3 years ago

By the way, I have a working set of Policies to handle this. In short, the logic is complete. We now just need to decide on how best to move forward. From there we can port anything I've already written into a fix for the PG or roll our own Policies.

jtracey93 commented 2 years ago

Trigger ADO Sync 1

jtracey93 commented 2 years ago

Trigger ADO Sync 2

pkorolo commented 1 year ago

Closing - fixed via PR mentioned above.