chime / terraform-aws-alternat

High availability implementation of AWS NAT instances.
MIT License
1.03k stars 63 forks source link

Connectivity checks fail when URL resolves to IPv6 address in IPv4-only VPC #87

Closed 0xdeadbeefJERKY closed 2 months ago

0xdeadbeefJERKY commented 2 months ago

Occasionally, urllib will resolve the provided check URL to an IPv6 address. If the VPC in which the Lambda function is running isn't configured to support IPv6, the Lambda function will throw the following error:

error connecting to https://www.google.com/: <urlopen error [Errno 97] Address family not supported by protocol>

Some simple Googling reveals that this is often attributed to the host not supporting IPv6 (example). Unfortunately, I haven't been able to find a trivial way to force the built-in urllib package to use IPv4-only resolution. One alternative would be to add requests or urllib3 as a dependency (and Lambda layer), and use the following to accomplish this:

# requests
requests.packages.urllib3.util.connection.HAS_IPV6 = False
# urllib3
urllib3.util.connection.HAS_IPV6 = False
bwhaley commented 2 months ago

Ahh, interesting. We previously used requests, but then switched to urllib to remove the dependency. Maybe we should switch again to urllib3.

Thanks for the bug report.

bwhaley commented 2 months ago

@0xdeadbeefJERKY - Would you be up for testing out the proposed fix in #88 ? You just need to use the patched version and set the variable, e.g.

module "alternat_instances" {
  source = "git::https://github.com/1debit/alternat.git//modules/terraform-aws-alternat?ref=patch-getaddrinfo"
  lambda_has_ipv6 = false
  ...
}
bwhaley commented 2 months ago

@0xdeadbeefJERKY Can you share a little more about your configuration so I can understand the conditions under which this is happening?

The reason I'm asking is that VPCs do not have IPV6 enabled by default, so folks have to enable it if they want it, and most do not. I think most Alternat users are not using IPV6 VPCs, and yet this is the first time this error has been reported. Would be nice to reproduce it.

bwhaley commented 2 months ago

If you have already applied this, please do so again. There was a bug in the previous version and I've fixed it.

0xdeadbeefJERKY commented 2 months ago

@bwhaley We're test driving the alternat deployment in us-west-2 and set CHECK_URLS to ["https://www.google.com]. I reviewed our configurations for the VPC, private subnet, route tables, and even Route 53 out of an abundance of caution, but didn't find anything that would suggest our environment is contributing to the issue. My hunch is that the Python 3.8 Lambda runtime supports dual-stack by default, but this is never reconciled with the VPC/private subnet configuration (or a relevant bug exists in the runtime that isn't publicly documented).

0xdeadbeefJERKY commented 2 months ago

@bwhaley We've been running alternat with the fix for a few days now and everything's working as expected 🎉 . Thanks so much!

bwhaley commented 2 months ago

Released in https://github.com/1debit/alternat/releases/tag/v0.4.9.

bwhaley commented 2 months ago

Side note: I sorta wonder if this is related to a change in a recent Python 3.8 patch release. I see changes to sockets and urllib in 3.8.19 and 3.8.17. AWS may have bumped the python3.8 runtime and the behavior changed slightly.