Closed ari-aviran closed 2 years ago
Thanks @ari-aviran!
Some additional notes:
As stated, We want to make sure we utilize existing tools that can save us precious time and effort regarding the collection of cloud resources and configuration from different cloud providers. Since this domain is booming it's possible that a 3rd party tool (could require some effort to set up but overall) can save us a lot of maintenance in the future and would help us support new cloud providers and benchmarks faster in the future.
The issue's emphasis is on resource collection, but another thing we should definitely include as part of the research is to find out whether we can also leverage the policy evaluation parts of these third party tools were assessing. For example Trivy supports the AWS CIS benchmark. Could we convert Trivy's findings to our findings format and schema and then save ourselves the policy authoring work as well?
Some names to add to the list apart from Trivy are Cloudquery, Cloudsploit (Or is that just what's Trivy running under the hood?).
To properly compare the tools and understand what we can benefit from them I suggest defining a table with some key parameters:
Also worth adding the default option to the comparison - How hard will it be to implement the fetchers and the policy ourselves?
Updated product epic: https://github.com/elastic/security-team/issues/5083
@ari-aviran @eyalkraft I see that the o11y team is doing a similar POC/feasibility analysis here https://github.com/elastic/obs-infraobs-team/issues/828 for sure worth looking into and sharing learnings!
Additional thing to take into consideration with 3rd party tools:
@eyalkraft Another 3rd party worth taking into account is cloud query by uptycs. Unlike its namesake, It's an osquery-based implementation - They support multi-cloud & other data sources like Kube-API. They are an OSS company that offers a control plane saas built on top of OSquery - u can get the details here https://github.com/elastic/security-team/issues/1886.
I think it might be worth evaluating it as a data ingestion mechanism since elastic has already invested in Osquery itself. It won't be able to handle evaluation OOTB, but it should be scaleable.
Interesting @DaveSys911! Maybe we'll invest time in checking that out if @ari-aviran's research wouldn't bear fruit. I do believe he has some interesting results he'll share soon.
Understood. Good idea to add a "licensing check" as part of the evaluation criteria. Some OSS tools might have more restrictive licensing that prevents just including them in our solution as is(like the elastic license).
The scope of the research/POC here is utilizing tools or libraries that can collect AWS resources for us, to speed up development by not writing the collection logic ourselves. In addition, I've also looked at evaluation capabilities that some tools provide, in an effort to save more time on our side and speed up the development even further.
I compared 3 OSS tools that can help us:
Another tool, cloudsploit, is written in JS and will be hard to use with cloudbeat, so I did not look into it all.
Here is a table that summarizes the research:
Criteria \ Tool | Trivy (defsec library) | cloudquery (platform) | cloudquery (osquery extension) |
---|---|---|---|
Fetching capabilities | Many AWS services - everything we need. | Many AWS services - everything we need. | Many AWS services - everything we need. |
Evaluation capabilities | Partial implementation of CIS for AWS v1.2. Some rules are written in Rego and some in Go. We would probably need to write our own rules but we can take inspiration from their policy. | No evaluation capabilities. | No evaluation capabilities. |
Ease of integration | Relatively easy. They expose a Scanner object as part of their Go API which we can initialize and commence fetching on demand. We can control which services we collect data from. |
Not trivial. Cloudquery is a platform with plugins for “source” (where data comes from) and “destination” (where data is saved). There is an AWS source plugin but integrating with it from cloudbeat requires using high-level cloudquery API and semantics. | Not trivial. This is built as an osquery extension and there is no easy way to initialize data collection directly, it is all managed via osquery. |
Integration concerns |
|
|
|
License | MIT. We can safely use this and modify if necessary | MPL-2. We can probably use it with minor limitations (if we change cloudquery source code we need to distribute them under the same MPL license). I am not a lawyer though. | Apache-2 / GPL (dual licensed). Can be used as Apache-2 so we can safely use it and modify if necessary |
Popularity/maintenance | Relatively popular, regularly maintained | Somewhat popular, regularly maintained | Not popular, unmaintained |
It seems that using Trivy (defsec) is the best option, and I went ahead and implemented a proof of concept for collecting S3 buckets and evaluating a single rule from the CIS for AWS benchmark. The POC went well and worked. Relevant links:
During implementation of the CIS AWS epic we have reverted our decision to use defsec
and instead decided to implement resource collection on top of the AWS SDK. We have already found some disadvantages of using defsec
, and when actually implementing on top of it we have found a couple more, which tipped the decision to not use it. In conclusion, here's what we didn't like with defsec
:
defsec
isn't really designed as an SDK).
To support CIS for AWS benchmark we need to collect data (resources) from different AWS services. We need to first do some research/POC on the best approach to collect said data. The kinds of data we'd like to collect (initial estimation by the product team, might change a bit):
There are some 3rd-party tools that can assist in the data collection (e.g.
trivy
) that should be evaluated as part of this research.Out of scope