buildkite / elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC
https://buildkite.com/docs/quickstart/elastic-ci-stack-aws
MIT License
414 stars 265 forks source link

Add new stack parameter for enabling dualstack docker [PLT-2325] #1306

Closed yob closed 4 weeks ago

yob commented 3 months ago

Adds new DockerNetworkingProtocol Cloud Formation Parameter, for opting in to starting containers with both ipv4 and ipv6 addresses. This might be useful when running the elastic stack on dual stack and ipv6-only subnets.

Prior to this change, elastic stack instances could run on dual stack and ipv6-only VPC subnets and many things continue to work as normal. However, docker builds and containers did not - containers are started with private ipv4 addresses and no ipv6 addresses. Attempts to connect to global or VPC ipv6 addresses from inside a container fail. When this new parameter is set to dualstack, containers will be assigned an ipv6 address and depending on the configuration of the VPC it's running in, containers may be able to connect to the outside world over ipv6.

dualstack subnets

When the new parameter is set to dualstack and the instances are running on a dualstack VPC subnet, we expect:

ipv6-only subnets

When the new parameter is set to dualstack and the instances are running on a dualstack VPC subnet, here's what should happen:

why 2001:db8::/32?

https://chameth.com/ipv6-docker-routing/ is a good explaination of the issue.

Given docker is NATing the containers ipv6 traffic, the natural ipv6 range to use would be fd00::/8 - the ULA range (https://en.wikipedia.org/wiki/Unique_local_address). That range will partially work, however on a dualstack subnet where containers have working ipv4 and ipv6 addresss, using fd00::/8 will make most requests default to ipv4 instead of ipv6. That's particularly undesirable in a CI environment on AWS because ipv4 traffic almost certainly goes via a NAT gateway with per Gb fees, and CI often generates a lot of AWS ingress traffic.

By using 2001:db8::/32 the dualstack containers should default to using ipv6 in most cases where they have the choice of either protocol. This may avoid significant NAT gateway fees.

2001:db8::/32 is technically reserved for documentation use only. The cost benefits are significant though, and docker is using them privately so there should be no negative impact.

why opt-in?

It would be nice to just auto-detect when the instances have a valid ipv6 addresses assigned and automatically start docker in dualstack mode. However, there is significant potential to change the behavior of networking in the container and might have surprising side effects. This prosposes starting opt-in so we can start to build some experience with dualstack docker and AWS. Eventually, it would be great to make it Just Work.

trying this out

I've been testing this on VPCs created by terraform, using the common public vpc module.

Dual stack VPCs are configured like this:

module "experiment_vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.6.0"

  name = "dualstack-experiment"
  cidr = "172.16.0.0/16"

  azs             = ["us-east-1a", "us-east-1c", "us-east-1d"]
  public_subnets  = ["172.16.0.0/24"]
  private_subnets = ["172.16.1.0/24", "172.16.2.0/24", "172.16.3.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true

  enable_ipv6                                    = true
  public_subnet_ipv6_prefixes                    = [3]
  public_subnet_enable_dns64                     = false
  private_subnet_assign_ipv6_address_on_creation = true
  private_subnet_ipv6_prefixes                   = [0, 1, 2]
  private_subnet_enable_dns64                    = false
}

IPv6-only Subnets are configured like this:

module "experiment_vpc_v6_only" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.6.0"

  name = "v6-only-experiment"
  cidr = "172.16.0.0/16"

  azs             = ["us-east-1a", "us-east-1c", "us-east-1d"]
  public_subnets  = ["172.16.0.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true

  enable_ipv6                                    = true
  public_subnet_ipv6_prefixes                    = [3]
  public_subnet_enable_dns64                     = false
  private_subnet_assign_ipv6_address_on_creation = true
  private_subnet_ipv6_prefixes                   = [0, 1, 2]
  private_subnet_enable_dns64                    = true
  private_subnet_ipv6_native                     = true
}

In both cases I pass the VPC and subnet IDs into the stacks as parameters, to avoid teh cloudformation stack creating its own VPC.:

    VpcId                                 = module.experiment_vpc.vpc_id
    Subnets                               = join(",", module.experiment_vpc.private_subnets)
yob commented 2 months ago

I've rebased this on top of v6.20.0