RaJiska / terraform-aws-fck-nat

Terraform module for fck-nat
MIT License
61 stars 30 forks source link

Route is destroyed and recreated when upgrading from 1.2 to 1.3 #40

Open jordanbd opened 5 days ago

jordanbd commented 5 days ago

I'd like some assistance upgrading from 1.2 to 1.3 when using the fck-nat TF module in combination with the TF vpc module.

When I upgrade from 1.2 to 1.3 I get the following error because TF is attempting to recreate my routes: Error: api error RouteAlreadyExists: Route in Route Table (rtb-xxx) with destination ( already exists. The TF logs show that the route is being destroyed and recreated:

  # module.fck-nat[0].aws_route.main[0] will be destroyed
  # (because resource does not use count)
  - resource "aws_route" "main" {
      - destination_cidr_block      = "" -> null
      - id                          = "r-rtb-0ed892a98ed869c161080289494" -> null
      - instance_id                 = "i-0233e87ff9c6e50c3" -> null
      - instance_owner_id           = "443495191585" -> null
      - network_interface_id        = "eni-0066c806c4ac12383" -> null
      - origin                      = "CreateRoute" -> null
      - route_table_id              = "rtb-0ed892a98ed869c16" -> null
      - state                       = "active" -> null
        # (11 unchanged attributes hidden)
  # module.fck-nat[0].aws_route.main["RESERVED_FKC_NAT"] will be created
  + resource "aws_route" "main" {
      + destination_cidr_block = ""
      + id                     = (known after apply)
      + instance_id            = (known after apply)
      + instance_owner_id      = (known after apply)
      + network_interface_id   = "eni-0066c806c4ac12383"
      + origin                 = (known after apply)
      + route_table_id         = "rtb-0ed892a98ed869c16"
      + state                  = (known after apply)

Ideally I would like to upgrade from 1.2 to 1.3 without TF attempting to destroy my routes and recreate them. To achieve this I believe I need to migrate from the deprecated update_route_table and route_table_id properties which were changed in this commit.

I am having trouble understanding what I needs to change, hence this ticket.

I have currently defined fck-nat as follows:

module "fck-nat" {
  count = length(module.vpc.public_subnets)
  source = "RaJiska/fck-nat/aws"

  name                 = "${local.prefix}-fck-nat-${count.index}"
  vpc_id               = module.vpc.vpc_id
  subnet_id            = module.vpc.public_subnets[count.index]

  instance_type        = var.fck_nat_instance_type
  ha_mode              = true
  use_cloudwatch_agent = true

  update_route_table   = true
  route_table_id       = module.vpc.private_route_table_ids[count.index]

My VPC looks something like this:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name                    = "${local.prefix}-vpc"
  cidr                    = ""
  azs                     = ["ap-southeast-2a", "ap-southeast-2b"]
  private_subnets         = ["", ""]
  public_subnets          = ["", ""]
  enable_dns_hostnames    = true
  enable_dns_support      = true
  enable_nat_gateway      = false
  map_public_ip_on_launch = true

My goal is to change my fck-nat config in such a way that prevents TF from deleting and recreating my routes (i.e. keeping the name module.fck-nat[0].aws_route.main[0] or module.fck-nat[0].aws_route.main[1] instead of module.fck-nat[0].aws_route.main["RESERVED_FKC_NAT"]. Assistance is appreciated!

jordanbd commented 3 days ago

I did some more playing around with this on another of our less critical services. After I got the error I reran Terraform and it completely, successfully creating the routes. However during the period between the failure and the rerun there was no route so all outgoing traffic over NAT was not working. Depending on your service this may or may not be acceptable.

So provided you immediately re-run terraform apply you can minimize the period that you don't have the required fck-nat routes and everything appears to be fine.

I would still like to understand if there's anything I can do to prevent the failure entirely, as a minute or two of missing nat routes on some of our services will cause a minor ruckus.