elastic / cloudbeat

Analyzing Cloud Security Posture
Other
39 stars 43 forks source link

[Automation] CSPM sanity test with cloudformation #1149

Open amirbenun opened 1 year ago

amirbenun commented 1 year ago

Motivation

The Create-Environment Github workflow creates an EC2 instance with Terraform. Later on, after the creation of agent policy, it ssh into the instance and installs the agent with the required fleet-url and enrollment-token parameters. Instead, we can use the cloudformation to deploy the CSPM instance as a whole unit. It will give us more coverage around that flow and we will be able to delete custom code on the test that runs the agent in the instance.

Definition of done

Related tasks/epics

amirbenun commented 1 year ago

Background

An elastic-agent deployment is composed of 2 parts:

  1. Creating the compute resource - on AWS, this means creating an EC2 instance.
  2. Downloading and installing the elastic-agent binaries.

Current deployment

The current deployment method is designed to test the manual deployment offered to the user. However, it is cumbersome and doesn't sufficiently mimic a real user's behavior:

  1. The EC2 instance is created as part of a Terraform script (which is unlikely the method users will choose).
  2. A custom code block downloads the elastic-agent artifact and installs it (On Kibana, there is a different snippet that is maintained there).

CloudFormation deployment

Adding CloudFormation deployment will provide full coverage of the automated deployment offered to the user. Additionally, the manual flow will be covered similarly to today:

  1. The EC2 instance is created by a provisioning tool.
  2. The elastic-agent installation command differs from what a user will see on Kibana.

If we had the CloudFormation deployment, I don't see any reason to maintain another deployment method that doesn't sufficiently cover the manual deployment.

Would love to hear your thoughts @oren-zohar @gurevichdmitry @tehilashn

gurevichdmitry commented 1 year ago

@amirbenun, I think it's a great initiative to add CloudFormation deployment for CSPM. Doing so will definitely increase our testing coverage.

However, I have reservations about completely removing the current deployment which mimics Linux tab deployment. Currently, we use a customized script to deploy the elastic-agent, but the script essentially contains the same instructions as the original one. The only difference is the EC2 instance type, which we can manage within the CloudFormation template through a selection from a combobox. On the other hand, with the current deployment, we have more flexibility in choosing any EC2 machine in Linux. I couldn't find any documentation that specifically recommends which EC2 machine type to use for this deployment. In theory, it might even be installed on a t2.micro instance.

Instead of entirely discarding the "old case," I suggest the following approach:

  1. Introduce CloudFormation CSPM deployment
  2. Keep the existing deployment using the customized script as an alternative deployment method in Create Environment workflow.
  3. Provide Deployment Flexibility:
    • Modify the Create-Environment workflow to allow users to choose between CloudFormation deployment, the current deployment, or both options
    • Consider setting the default option to CloudFormation but allowing users to opt for the current deployment if necessary.
  4. Staying with the old deployment as an option, in the future, we can also expand test coverage in the following ways:
    • Perform AWS multi-account verification:
      • Utilize CloudFormation deployment to test the development account.
      • Use the current deployment to test the test account.
    • For AWS single account verification
      • Deploy both options in the same account and compare the results to ensure consistency.
amirbenun commented 1 year ago

I don't think that the instance type makes a lot of difference for the two flows. However, I thought about another reason that is important enough to support that deployment method. The two deployment methods use different authentication:

  1. On CloudFormation, we attach a role to the EC2 instance. Next, the AWS SDK is automatically authenticated with the EC2 role.
  2. With manual deployment, we pass the AWS credentials through the agent policy. Next, cloudbeat reads those and configures AWS SDK to use it by default.

I suggest supporting both deployment methods. Instead of a new configuration to select a deployment method for CSPM, we can deploy two agents and verify that we get findings from both of them.