RaJiska / terraform-aws-fck-nat

Terraform module for fck-nat
https://registry.terraform.io/modules/RaJiska/fck-nat/aws/latest
MIT License
61 stars 30 forks source link

Route is destroyed and recreated when upgrading from 1.2 to 1.3 #40

Open jordanbd opened 5 days ago

jordanbd commented 5 days ago

I'd like some assistance upgrading from 1.2 to 1.3 when using the fck-nat TF module in combination with the TF vpc module.

When I upgrade from 1.2 to 1.3 I get the following error because TF is attempting to recreate my routes: Error: api error RouteAlreadyExists: Route in Route Table (rtb-xxx) with destination (0.0.0.0/0) already exists. The TF logs show that the route is being destroyed and recreated:

  # module.fck-nat[0].aws_route.main[0] will be destroyed
  # (because resource does not use count)
  - resource "aws_route" "main" {
      - destination_cidr_block      = "0.0.0.0/0" -> null
      - id                          = "r-rtb-0ed892a98ed869c161080289494" -> null
      - instance_id                 = "i-0233e87ff9c6e50c3" -> null
      - instance_owner_id           = "443495191585" -> null
      - network_interface_id        = "eni-0066c806c4ac12383" -> null
      - origin                      = "CreateRoute" -> null
      - route_table_id              = "rtb-0ed892a98ed869c16" -> null
      - state                       = "active" -> null
        # (11 unchanged attributes hidden)
    }
  # module.fck-nat[0].aws_route.main["RESERVED_FKC_NAT"] will be created
  + resource "aws_route" "main" {
      + destination_cidr_block = "0.0.0.0/0"
      + id                     = (known after apply)
      + instance_id            = (known after apply)
      + instance_owner_id      = (known after apply)
      + network_interface_id   = "eni-0066c806c4ac12383"
      + origin                 = (known after apply)
      + route_table_id         = "rtb-0ed892a98ed869c16"
      + state                  = (known after apply)
    }

Ideally I would like to upgrade from 1.2 to 1.3 without TF attempting to destroy my routes and recreate them. To achieve this I believe I need to migrate from the deprecated update_route_table and route_table_id properties which were changed in this commit.

I am having trouble understanding what I needs to change, hence this ticket.

I have currently defined fck-nat as follows:

module "fck-nat" {
  count = length(module.vpc.public_subnets)
  source = "RaJiska/fck-nat/aws"

  name                 = "${local.prefix}-fck-nat-${count.index}"
  vpc_id               = module.vpc.vpc_id
  subnet_id            = module.vpc.public_subnets[count.index]

  instance_type        = var.fck_nat_instance_type
  ha_mode              = true
  use_cloudwatch_agent = true

  update_route_table   = true
  route_table_id       = module.vpc.private_route_table_ids[count.index]
}

My VPC looks something like this:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name                    = "${local.prefix}-vpc"
  cidr                    = "10.5.0.0/16"
  azs                     = ["ap-southeast-2a", "ap-southeast-2b"]
  private_subnets         = ["10.5.2.0/24", "10.5.3.0/24"]
  public_subnets          = ["10.5.0.0/24", "10.5.1.0/24"]
  enable_dns_hostnames    = true
  enable_dns_support      = true
  enable_nat_gateway      = false
  map_public_ip_on_launch = true
}

My goal is to change my fck-nat config in such a way that prevents TF from deleting and recreating my routes (i.e. keeping the name module.fck-nat[0].aws_route.main[0] or module.fck-nat[0].aws_route.main[1] instead of module.fck-nat[0].aws_route.main["RESERVED_FKC_NAT"]. Assistance is appreciated!

jordanbd commented 3 days ago

I did some more playing around with this on another of our less critical services. After I got the error I reran Terraform and it completely, successfully creating the routes. However during the period between the failure and the rerun there was no 0.0.0.0/0 route so all outgoing traffic over NAT was not working. Depending on your service this may or may not be acceptable.

So provided you immediately re-run terraform apply you can minimize the period that you don't have the required fck-nat routes and everything appears to be fine.

I would still like to understand if there's anything I can do to prevent the failure entirely, as a minute or two of missing nat routes on some of our services will cause a minor ruckus.