chime / terraform-aws-alternat

High availability implementation of AWS NAT instances.
MIT License
1.08k stars 65 forks source link

Allow overriding NAT Gateway for fallback #89

Closed kristian-lesko closed 6 months ago

kristian-lesko commented 7 months ago

Hello, thanks a lot for this wonderful module! I'd like to propose a simple extension that may be useful for some users.

It may be the case that only NAT Gateway is kept for fallback from alternat instances, since paying for separate gateways per AZ may not be worth the extra NatGatewayHours costs.

In such case, it can be useful to manage the single NAT Gateway outside of the module, and provide its ID to Lambda functions. This would override the default behaviour of selecting the NAT Gateway in the same subnet for fallback.

resource "aws_nat_gateway" "this" {}

module "alternat" {
  create_nat_gateways = false
  lambda_environment_variables = {
    NAT_GATEWAY_ID = aws_nat_gateway.this.id
  }
}
bwhaley commented 7 months ago

Cool! I think this is a good idea. Note that you will incur $0.01/GB in each direction by routing across AZs.

It could be even easier to use if we added a new optional variable nat_gateway_id to the Terraform module. If set, it would set up the env var on both of the Lambdas. Similar to lambda_has_ipv6 but for both functions. Then we can add another note in the README in the "other considerations" section to explain it.

Would you be up for that small tweak?

kristian-lesko commented 7 months ago

Hi @bwhaley,

Sure, sounds good! I had an extra idea, which was to extend the vpc_az_maps variable with a nat_gateway_id attribute:

variable "vpc_az_maps" {
  description = "A map of az to private route tables that the NAT instances will manage."
  type = list(object({
    az                 = string
    nat_gateway_id     = optional(string)
    private_subnet_ids = list(string)
    public_subnet_id   = string
    route_table_ids    = list(string)
  }))
}

This might be the most flexible approach, since it would cover all situations when there are less NAT Gateways deployed than there are availability zones.

Would that work in your opinion? Thanks!

bwhaley commented 7 months ago

I think that would work if you plumb it through to both of the Lambda function handlers. That would be more flexible than the original idea. It's not as DRY - you might use the same NAT gateway ID for each object in the list, like for your use case. Still, it doesn't seem that bad. Want to make an attempt at this approach?

kristian-lesko commented 6 months ago

Hello @bwhaley,

After further thought, I decided to try and implement your first suggestion from https://github.com/1debit/alternat/pull/89#issuecomment-2064916471. I didn't realise previously that getting the proper NAT Gateway ID value to the autoscaling hook, which spans through all subnets, would be more tedious. For the use case I have, a single override NAT Gateway ID is plenty.

Can you please re-check the code to see if this would work for you? Thanks a lot!