ComplianceAsCode / content

Security automation content in SCAP, Bash, Ansible, and other formats
https://complianceascode.readthedocs.io/en/latest/
Other
2.19k stars 695 forks source link

Several required audit rules will severely damage performance, and possibly overrun the audit buffer, on general purpose systems #6598

Open trevor-vaughan opened 3 years ago

trevor-vaughan commented 3 years ago

Description of problem:

The following syscalls are dangerous to audit for all calls due to the commonality of the system call and likelihood to be triggered by any user or application during routine maintenance activities.

Quite often, the audit backlog will reach the point where the system will shut down if configured in accordance with the guidance. This will also happen during system updates using sudo and, of course, will almost certainly be triggered if trying to remediate a system using the inbuilt Ansible, Puppet, or Bash rules.

Recommend making the rules optional and/or marking them as dangerous for general purpose implementation since these are so common as to be generally useless except on restricted, single purpose, systems.

Rules

SCAP Security Guide Version:

All

Operating System Version:

All

Steps to Reproduce:

  1. Create a system with limited resources
  2. Set up auditd in accordance with the policies
  3. Run a full system update using sudo

Actual Results:

The system will probably panic. If it doesn't, it will produce so much noise as to be largely useless unless you sell a log processing infrastructure based on the amount of data ingested.

Expected Results:

The system does not panic and the output is generally useful.

vojtapolasek commented 3 years ago

Hello, I understand reasoning behind your concerns. Could you specify what kind of "system with limited resources" do you mean? Also what combination of rules (which policy) do you have in mind?

trevor-vaughan commented 3 years ago

@vojtapolasek wow, not sure how I missed this! Try 1 CPU and 512M RAM. It's a perfectly reasonable system for a lot of things but will probably give you issues.

JAORMX commented 3 years ago

@trevor-vaughan hey! Thanks for reporting this. Out of interest, what's the use-case? Is this an IOT system? or what kind of workloads is this supposed to run? What profile did you try?

vojtapolasek commented 3 years ago

Hello, if you have a reasoning why such rules are not suitable for your system, you can always use tayloring to remove such rules from the profile. Could this be the solution to your problem? Do you perceive this as a generic problem or do you have particular profiles in mind?

trevor-vaughan commented 3 years ago

@JAORMX The use case is 'anything not a workstation that happens to be doing real work' and/or stuff with limited resources that you need to update on time-specific update windows (lots of updates at once). Profile => STIG/OSPP.

@vojtapolasek Unfortunately, rule tailoring isn't a thing in most places. It's 'do it' -> 'watch it break' -> 'maybe tailor...maybe'. These checks are seen as the "gold standard" by a lot of people and have to be written as such to protect systems from the 'do it because I say so' gang. I know that there's a ton of language that says "make it your way" but people don't because it's difficult, time consuming to maintain, process expensive, and prone to breakage as new versions of the content comes out.

vojtapolasek commented 3 years ago

@trevor-vaughan sorry for the long delay. I have few questions.

  1. Would it be acceptable if we add warnings to specific rules, you mentioned some examples in the description. Can you give examples of particular SCAP rules? There are several rules concerning a syscall in the content. We can't drop rules, as they are part of the policy.
  2. What is the exact problem with usage of tayloring which you experience? I believe we can improve it, but we need more details.
  3. You are talking about Audit backlog. Do you mean the backlog as described here? https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-audit-backlog-errors-ec2/ e.g. there are too many events and they can't be processed fast enough and an action is triggered? Is it possible to increase size of backlog? Thank you.
yuumasato commented 3 years ago

Profile => STIG/OSPP.

These checks are seen as the "gold standard" by a lot of people and have to be written as such to protect systems from the 'do it because I say so' gang.

We try to follow the STIG and OSPP guidance. Before first STIG release for RHEL8, STIG used to follow OSPP closely. But now, the STIG has deviated considerably from OSPP. And the STIG profile doesn't add audit rules for successful syscalls calls, which should help with performance.

marcusburghardt commented 1 year ago

Related to https://github.com/ComplianceAsCode/content/issues/9849