arkime / aws-aio

Apache License 2.0
8 stars 3 forks source link

Update CLI to enable traffic mirroring on VPC #6

Closed chelma closed 1 year ago

chelma commented 1 year ago

Description

This task is update the CLI to be able to add traffic mirroring from a target VPC in the user's account to the Capture VPC in the user's account. It's expected this will be done by spinning up a CloudFormation stack that encapsulates the traffic mirroring details. Removal of the mirroring setup should occur when the CLI is invoked to tear down the full Arkime setup. Removal of the mirroring setup as an individual unit will be manual (non-CLI). It is expected that at the end of this task, the capture nodes should receive the user traffic and they should correctly transmit it to the capture bucket and the OpenSearch domain.

Acceptance Criteria

chelma commented 1 year ago

Picking up task

chelma commented 1 year ago

I've been looking into how to structure this mirroring, and it seems like Gateway Load Balancers are ideally suited for this use-case [1][2]. However, there does not appear to be CDK support for its resource type yet [3]; searched through the CDK GitHub code, issues, etc. There is support in raw CloudFormation, however [4].

The mirroring docs indicate the options for mirroring from VPC1 to VPC2 in the same account are: "Intra-Region peering or a transit gateway or Gateway Load Balancer endpoint" [5].

Currently exploring transit gateways to see if they're a plausible option here.

[1] https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-aws-gateway-load-balancer-supported-architecture-patterns/ [2] https://docs.aws.amazon.com/vpc/latest/mirroring/tm-example-glb-endpoints.html [3] https://github.com/aws/aws-cdk/tree/main/packages/aws-cdk-lib/aws-elasticloadbalancingv2 [4] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-loadbalancer.html#cfn-elasticloadbalancingv2-loadbalancer-type [5] https://docs.aws.amazon.com/vpc/latest/mirroring/traffic-mirroring-connection.html

chelma commented 1 year ago

Well, first off there is low-level CDK support for Transit Gateways, which is a plus [1]. From what I can tell, a Transit Gateway makes addresses in connected VPCs routeable rather than presenting a single endpoint for traffic to flow to like the Gateway Load Balancer. That probably isn't what we want here.

Gonna get a feel for how hard it would be to use CloudFormation to set up Gateway Load Balancers by making my own CDK Construct, as GLBs seem like the right answer here.

[1] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.CfnTransitGateway.html [2] https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html

chelma commented 1 year ago

OK - deep dived setting up GLBs with CDK, and it seems doable but will require trial and error.

The GLB getting started guide [1] provides a roadmap for how to set these up. The following components are:

All of the above resources are not unique to any traffic source and should probably be created as part of setup for the Capture VPC.

The following component is unique for each Traffic Source AWS Account. In our case, we're staying within the same account.

The follow components are unique for each Traffic Source VPC:

[1] https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/getting-started-cli.html [2] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.CfnLoadBalancer.html [3] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.CfnTargetGroup.html [4] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.CfnListener.html [5] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.VpcEndpointService.html [6] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.CfnVPCEndpoint.html

chelma commented 1 year ago

Looks like there's some examples of doing some/all of this floating around. Here's one of them [1].

[1] https://github.com/aws-samples/aws-secure-environment-accelerator/blob/main/src/lib/cdk-constructs/src/vpc/glb.ts

chelma commented 1 year ago

I've been fighting with Gateway LBs, ECS/Fargate, and CloudFormation all day. I feel like I have a good handle on the CloudFormation required to set up a GWLB. However, I'm having trouble getting my Fargate Tasks to register with the GWLB's target group.

GWLBs seem like a niche feature and don't have too many examples floating around for them. The only ones I can find that combine ECS with GWLB use EC2-backed ECS and associate the Target Group with the EC2 ASG. The ECS docs themselves don't even mention GWLBs [2]. Running through the "Getting Started" AWS CLI steps [3] and specifying the IPs of my Fargate Tasks as the Target Group targets with aws elbv2 register-targets ... works, but is obviously manual.

I'm gonna pretty close to switching over from Fargate to ECS-on-EC2; gonna give it a few more minutes though.

[1] https://github.com/aws-samples/aws-gateway-load-balancer-suricata-ids-ips-nsm/tree/45e590061a47a5bd022c62871d80b62cc23d0d4d [2] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/load-balancer-types.html [3] https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/getting-started-cli.html

chelma commented 1 year ago

On another note, I've also been having trouble with my ECS Service resources failing to stabilize during Cfn operations; very annoying. There's a fairly unhelpful support post about it. [1]

[1] https://repost.aws/knowledge-center/cloudformation-ecs-service-stabilize

chelma commented 1 year ago

Last-ditch effort failed; switching to ECS-on-EC2.

chelma commented 1 year ago

Switching over to ECS-on-EC2 was easy, and it appears that the GWLB and our ECS Cluster are integrated now. The Cluster's containers don't respond to LB health checks yet so it's hard to tell... but things look good enough for me to move on for the moment.

chelma commented 1 year ago

OK - ready to start tackling mirroring setup. However, to do that I need to have a plan for how all of our top-level calls will work together to handle state management. Here's what I'm currently thinking.

chelma commented 1 year ago

SSM doc for dealing with parameter hierarchies: https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-paramstore-hierarchies.html

chelma commented 1 year ago

PR posted for basic list-clusters, add-vpc, and remove-vpc capability. (see: https://github.com/arkime/cloud-demo/pull/17).

I confirmed that traffic is being mirrored from the traffic source (our demo fargate containers curling Alexa top-100 sites) to the VPC Endpoint of our Gateway Load Balancer. Wasn't able to confirm that the traffic makes it to our capture nodes because I can't add them to the GWLB Target Group unless they respond to health checks, which they don't do. Therefore, the next steps are:

chelma commented 1 year ago

PR posted to resolve the task: https://github.com/arkime/cloud-demo/pull/19

There will be some followup/cleanup work.

chelma commented 1 year ago

PR merged; resolving. Follow-up work discussed in parent task (https://github.com/arkime/cloud-demo/issues/3)