Run Linux-level CIS benchmark on EKS Management Node Group

nickumia-reisys commented 2 years ago

User Story

In order to understand the risk of running non-CIS-compliant EC2 AMIs in our EKS managed node groups, the SSB team wants to run the CIS Distribution-Independent Linux Benchmark on instances running both the stock and GSA-hardened image.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

[x] WHEN I look in the Google Drive folder for test CIS scans \ THEN I see files clearly labeled that include test results from scans on the two AMIs.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

No impact... We are doing this work to gather information for decision making, but not actually changing our system.

Sketch

Find an instance of the type you want to test in AWS SSM Fleet Manager, and start a terminal session there. Then:

sudo su -
docker pull chef/inspec 
function inspec { docker run -it --rm -v $(pwd):/share chef/inspec "$@"; }
curl -L https://github.com/dev-sec/cis-dil-benchmark/archive/refs/heads/master.zip > cis.zip
unzip cis.zip
inspec exec cis-dil-benchmark-master

nickumia-reisys commented 2 years ago

Created this issue to track upstream ticket,

https://github.com/dev-sec/cis-dil-benchmark/issues/121

mogul commented 2 years ago

We were able to run the CIS DIL benchmarks by hand. (See the "sketch" section of the post above for the how-to.)

The results for both the stock AWS Linux 2 optimized for EKS image and the GSA ISE-hardened image are in this Google Drive folder.

We compared the results and communicated them with our ISSM and the folks providing the hardened AMI in the email thread Container Compliance Scans today.

mogul commented 2 years ago

I'm documenting the issue we ran into when trying to automate the CIS audit through the AWS SSM console here, both as a reference point for people searching, and so that I can point AWS support at this info.

Following this article, I was able to successfully run a Linux Inspec profile (linux-baseline) using RunCommand. This is the corresponding sourceInfo:

"{ 
    "owner":"dev-sec", 
    "repository":"linux-baseline", 
    "path": "", 
    "getOptions" : 
    "branch:master", 
    "tokenInfo":"{{ssm-secure:github-personal-token}}" 
}"

The resulting reports were integrated into the Compliance section of System Manager and were easy to read. Very nice.

Then I tried to run the CIS Distribution Independent Benchmark (cis-dil-benchmark) with this sourceInfo:

"{ 
    "owner":"dev-sec", 
    "repository":"cis-dil-benchmark", 
    "path": "", 
    "getOptions" : "branch:master", 
    "tokenInfo":"{{ssm-secure:github-personal-token}}" 
}"

I got errors. The stderr under RunCommand said:

/opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/seahorse/client/plugins/raise_response_errors.rb:17:in `call': Compliance item can have up to 800 KB in total. (Aws::SSM::Errors::ItemSizeLimitExceededException)
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:22:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/seahorse/client/plugins/request_callback.rb:71:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
from /opt/chef-workstation/embedded/lib/ruby/gems/3.0.0/gems/aws-sdk-core-3.130.0/lib/seahorse/client/request.rb:72:in `send_request'
from /root/.chefdk/gem/ruby/3.0.0/gems/aws-sdk-ssm-1.134.0/lib/aws-sdk-ssm/client.rb:8065:in `put_compliance_items'
from ./Report-Compliance-20200225:122:in `<main>'
failed to run commands: exit status 1

Note in particular the error: Compliance item can have up to 800 KB in total. (Aws::SSM::Errors::ItemSizeLimitExceededException)

Although that error is documented in the AWS SDK docs, I couldn't figure out why I would be seeing it when we run this InSpec profile, but not the previous one. We only saw this one Gitter reference describing what we were seeing, and no indication that anyone got an answer there.

Searching more we found it's failing a validation set by AWS itself. There appears to be hard requirement on the technical specifications of a benchmark in terms of a service quota for AWS:ComplianceItem objects:

AWS:ComplianceItem limits

We figured that 800K must refer to the size of the report generated by the RunCommand. We just needed to see the results for now and couldn't figure this out. So rather than automate the AWS-RunInspecChecks command and have it report to Compliance we ran the check by hand using the method described in our sketch above. That enabled us to copy the test report out of the terminal window, and do what we needed to do on the team.

Here's the thing, though: The Inspec report output was only 123K! So our hypothesis that the error was due to the RunCommand output being bigger than 800K went out the window.

For compliance purposes, we do eventually want to go back and ensure the CIS benchmark is periodically run, and that the results are auditable using nice tools like Compliance. However, we're stumped as to why we ran into the ItemSizeLimitExceededException and have no idea what we would have to do differently. Also there doesn't appear to be any way to request an increase in the AWS:ComplianceItem service quota, so we're not sure if that's actually a hard limit or not.

mogul commented 2 years ago

I reproduced this with our AWS rep in a screenshare session. We found that the AWS-RunInSpecChecks document includes the flag --reporter json when it runs inspec exec, and that the output is indeed >800K (the output was 817K for one stock EKS AMI, and 799K for another, so you might not hit this limit depending on the instance).

Reported upstream to AWS in the ssb-production account and asked for the service quota to be increased if possible, since the forms don't allow for it... Case ID 9924596491.

mogul commented 2 years ago

Note that if you want to see what a RunCommand document invocation actually does, there are artifacts left behind in the instance. This command will help you find them.

# find /var/lib/amazon/ssm/i-*/document/orchestration/*

In this case the script was called AWS-RunInspecChecks-20201211.sh and the relevant lines were:

# Accept Chef license
export CHEF_LICENSE=accept-no-persist

# unset pipefail as InSpec exits with error code if any tests fail
set +eo pipefail
inspec exec . --reporter json | ruby ./Report-Compliance-20200225
if [ $? -ne 0 ]; then
  echo "Failed to execute InSpec tests: see stderr"
  EXITCODE=2
fi

nickumia-reisys commented 2 years ago

Following back here, running Inspec in a docker container does not fulfill the purpose of the Linux-level Benchmark because the benchmark tests for mount points, open ports, system security settings which will differ in the docker container vs. on the host. As an example, I ran the same benchmark on the host vs. in the docker container and this was the difference in results,

# On Host
Profile Summary: 141 successful controls, 56 control failures, 38 controls skipped
Test Summary: 831 successful, 1934 failures, 38 skipped

# In Docker
Profile Summary: 75 successful controls, 96 control failures, 56 controls skipped
Test Summary: 495 successful, 267 failures, 59 skipped

mogul commented 2 years ago

That's about what I would expect... The DIL benchmark only makes sense on container hosts!

nickumia-reisys commented 2 years ago

The only reason I mentioned it was because, in the sketch, we had put,

sudo su - docker pull chef/inspec function inspec { docker run -it --rm -v $(pwd):/share chef/inspec "$@"; }

That doesn't work, inspec would need to be installed with something like this.

So, to summarize, our original findings were bad.

mogul commented 2 years ago

Oh good point!

GSA / data.gov