elastic / cloudbeat

Analyzing Cloud Security Posture
Other
15 stars 43 forks source link

CIS AWS - Resource collection research/POC #409

Closed ari-aviran closed 2 years ago

ari-aviran commented 2 years ago

To support CIS for AWS benchmark we need to collect data (resources) from different AWS services. We need to first do some research/POC on the best approach to collect said data. The kinds of data we'd like to collect (initial estimation by the product team, might change a bit):

There are some 3rd-party tools that can assist in the data collection (e.g. trivy) that should be evaluated as part of this research.

Out of scope

eyalkraft commented 2 years ago

Thanks @ari-aviran!

Some additional notes:

Motivation

As stated, We want to make sure we utilize existing tools that can save us precious time and effort regarding the collection of cloud resources and configuration from different cloud providers. Since this domain is booming it's possible that a 3rd party tool (could require some effort to set up but overall) can save us a lot of maintenance in the future and would help us support new cloud providers and benchmarks faster in the future.

Scope

The issue's emphasis is on resource collection, but another thing we should definitely include as part of the research is to find out whether we can also leverage the policy evaluation parts of these third party tools were assessing. For example Trivy supports the AWS CIS benchmark. Could we convert Trivy's findings to our findings format and schema and then save ourselves the policy authoring work as well?

3rd part tools to evaluate

Some names to add to the list apart from Trivy are Cloudquery, Cloudsploit (Or is that just what's Trivy running under the hood?).

DOD

Tool comparison and assessment

To properly compare the tools and understand what we can benefit from them I suggest defining a table with some key parameters:

  1. Ease/Cost of integration (Is it just calling a Go function vs Running severs/DBs/integrating other components/etc). This should be very detailed (What function should we call and how).
  2. What's the benefit of using his tool?
    • Does It saves us from writing the fetchers? Which resources can we fetch using this tool?
    • Does the tool handle quotas, pagination, errors, rate limits, backoffs? If it does, how? Can we leverage it? Can we control it?
    • What are possible issues/upsides regarding the performance of this tool?
    • Does it save us from authoring policy? Does it produce resources or findings? Can we use the findings? Can we control the rules running?

Also worth adding the default option to the comparison - How hard will it be to implement the fetchers and the policy ourselves?

Discuss and decide

Follow up

Open questions

  1. Are there AWS CIS Benchmark rules that talk about more than a single resource (and thus will require feeding OPA with more than a single resource at a time)?
tinnytintin10 commented 2 years ago

Updated product epic: https://github.com/elastic/security-team/issues/5083

tinnytintin10 commented 2 years ago

@ari-aviran @eyalkraft I see that the o11y team is doing a similar POC/feasibility analysis here https://github.com/elastic/obs-infraobs-team/issues/828 for sure worth looking into and sharing learnings!

eyalkraft commented 2 years ago

Additional thing to take into consideration with 3rd party tools:

DaveSys911 commented 2 years ago

@eyalkraft Another 3rd party worth taking into account is cloud query by uptycs. Unlike its namesake, It's an osquery-based implementation - They support multi-cloud & other data sources like Kube-API. They are an OSS company that offers a control plane saas built on top of OSquery - u can get the details here https://github.com/elastic/security-team/issues/1886.

I think it might be worth evaluating it as a data ingestion mechanism since elastic has already invested in Osquery itself. It won't be able to handle evaluation OOTB, but it should be scaleable.

eyalkraft commented 2 years ago

Interesting @DaveSys911! Maybe we'll invest time in checking that out if @ari-aviran's research wouldn't bear fruit. I do believe he has some interesting results he'll share soon.

DaveSys911 commented 2 years ago

Understood. Good idea to add a "licensing check" as part of the evaluation criteria. Some OSS tools might have more restrictive licensing that prevents just including them in our solution as is(like the elastic license).

ari-aviran commented 2 years ago

The scope of the research/POC here is utilizing tools or libraries that can collect AWS resources for us, to speed up development by not writing the collection logic ourselves. In addition, I've also looked at evaluation capabilities that some tools provide, in an effort to save more time on our side and speed up the development even further.

I compared 3 OSS tools that can help us:

Another tool, cloudsploit, is written in JS and will be hard to use with cloudbeat, so I did not look into it all.

Here is a table that summarizes the research:

Criteria \ Tool Trivy (defsec library) cloudquery (platform) cloudquery (osquery extension)
Fetching capabilities Many AWS services - everything we need. Many AWS services - everything we need. Many AWS services - everything we need.
Evaluation capabilities Partial implementation of CIS for AWS v1.2. Some rules are written in Rego and some in Go. We would probably need to write our own rules but we can take inspiration from their policy. No evaluation capabilities. No evaluation capabilities.
Ease of integration Relatively easy. They expose a Scanner object as part of their Go API which we can initialize and commence fetching on demand. We can control which services we collect data from. Not trivial. Cloudquery is a platform with plugins for “source” (where data comes from) and “destination” (where data is saved). There is an AWS source plugin but integrating with it from cloudbeat requires using high-level cloudquery API and semantics. Not trivial. This is built as an osquery extension and there is no easy way to initialize data collection directly, it is all managed via osquery.
Integration concerns
  • No resource streaming - memory usage grows with the amount of cloud resources.
  • Auth with AWS handled internally in the library, we have no control over it.
  • Works per-region by default.
  • No resource streaming - memory usage grows with the amount of cloud resources.
  • Auth with AWS handled internally in the library, we have no control over it.
  • All-or-nothing services collection.
  • No resource streaming - memory usage grows with the amount of cloud resources.
  • Auth with AWS handled internally in the library, we have no control over it.
  • All-or-nothing services collection.
License MIT. We can safely use this and modify if necessary MPL-2. We can probably use it with minor limitations (if we change cloudquery source code we need to distribute them under the same MPL license). I am not a lawyer though. Apache-2 / GPL (dual licensed). Can be used as Apache-2 so we can safely use it and modify if necessary
Popularity/maintenance Relatively popular, regularly maintained Somewhat popular, regularly maintained Not popular, unmaintained

It seems that using Trivy (defsec) is the best option, and I went ahead and implemented a proof of concept for collecting S3 buckets and evaluating a single rule from the CIS for AWS benchmark. The POC went well and worked. Relevant links:

ari-aviran commented 1 year ago

During implementation of the CIS AWS epic we have reverted our decision to use defsec and instead decided to implement resource collection on top of the AWS SDK. We have already found some disadvantages of using defsec, and when actually implementing on top of it we have found a couple more, which tipped the decision to not use it. In conclusion, here's what we didn't like with defsec: