AndrewGuenther / fck-nat

Feasible cost konfigurable NAT: An AWS NAT Instance AMI
https://fck-nat.dev
MIT License
1.16k stars 46 forks source link

NAT64 Support #41

Open LazerGrieves opened 10 months ago

LazerGrieves commented 10 months ago

Hello,

I've just discovered fck-nat and I am thrilled that something like this exists because those AWS Managed NAT Gateways are pricey (especially with a Multi-AZ implementation).

However, I am wondering if there are any plans to implement support for NAT64 in fck-nat? I've been experimenting with IPv6-only private subnets ever since AWS announced that there will be charges for Public IPv4 addresses starting in 2024 and NAT64 seems like a requirement.

Thank you.

apparentorder commented 10 months ago

I asked the same thing last week and got this reply.

I have a proof of concept NAT64 Linux box running on AWS, and it works as expected (thankfully, AWS VPC supports DNS64 out of the box and for free!). I hope to take a look at integrating this with fck-nat over the next 1-2 weeks.

AndrewGuenther commented 9 months ago

@LazerGrieves @apparentorder I just want to make sure you're aware of Egress Only Internet Gateways which provide NAT64 in VPCs for free. If you're already aware of that feature, is there a reason it doesn't work for your use case?

apparentorder commented 9 months ago

I'm still pre-first-coffee right now, but I'm pretty sure that EO-IGW does not provide NAT64; the link above does not contradict that, and the DNS64/NAT64 docs support that (the NAT64 route 64::ff9b::/96 is supposed to point to a NAT Gateway). NAT64 is a feature of Managed NAT Gateway. Do you have additional resources, or have you tried yourself?

RaJiska commented 9 months ago

Unless I am mistaken, one of the core features of NAT64 is allowing communication from IPv6 (our instance), to IPv4 (remote server). Egress-Only IGW seem to be able to communicate only from IPv6 to IPv6 addresses, hence requiring NAT64 to offer translation when one wishes to have an IPv6 <> IPv4 communication.

AndrewGuenther commented 9 months ago

Ah, yep, you're both correct. Had it in my head that EO-IGWs did NAT64 but I was mistaken. @apparentorder was pre-first-coffee and I was post-time-for-bed

RaJiska commented 9 months ago

@LazerGrieves @apparentorder I fully implemented this feature into fck-nat, this really wasn't a piece of cake. I implemented this using Tayga which comes with it's lot of constraints (therefore probably not expected to be a on-by-default feature) that I will elaborate on in the PR.

Have been able to make this work with both IPs as well as domain names, even for those not supporting AAAA fields, leveraging AWS' integrated DNS. To note, AWS DNS64 is very slow for some reason with requests taking up to 3 seconds.

I will make a PR for this feature either sometime today or tomorrow.

apparentorder commented 9 months ago

Very cool! My proof of concept was with Jool, which is a kernel module, hence more performant – and it seems to be "more maintained" than tayga. But I really wasn't sure how to best combine building a kmod from source with fck-nat's current packer infrastructure, so I'm really glad that you went ahead :-)

Regarding this:

AWS DNS64 is very slow for some reason with requests taking up to 3 seconds

I have seen EC2 IPv6-only VMs that were deployed with a broken resolver config. The resolv.conf actually had an IPv4 resolver listed first. I have observed the same DNS lookup delay and assumed it's because the VM tries to query the IPv4 resolver first, and only proceeds to query the IPv6 resolver address after a 3s timeout. I didn't dig into the issue, so I cannot say if this theory is correct, and if it only affects AL2023, or "all" AWS-provided AMIs. But the DNS64 support in itself was not slow for me. It worked great once I removed the IPv4 resolver.

RaJiska commented 9 months ago

Thanks @apparentorder for sharing! Jool is indeed a much better alternative than TAYGA. In addition to not having the various constraints TAYGA is riddled with, it also runs as a kernel module which indeed is more yields better results.

As for the slow DNS issue you are right as well, the resolv.conf had the IPv4 resolver listed first which caused the issue in the first place. Removing it entirely fixed the problem. This issue is therefore also happening on AL2 and is probably worth looking into further.

nickpetrovic commented 5 months ago

@apparentorder wondering if you tried making ipv4 take precedence over ipv6 by adding a precedence value in the getaddrinfo config (/etc/gai.conf)? I've run into a similar situation before and this solved it for me. It has to be the very first thing your script does.

echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
AndrewGuenther commented 5 months ago

I've created a fck-nat-1.4 branch which I'll use to build a pre-release AMI off of which those who are interested can use to test NAT64 support. I don't have the bandwidth at the moment to do extensive NAT64 testing personally so my primary testing focus will be ensuring this feature doesn't cause any regressions for regular NAT use cases.

I expect to have the pre-release AMI out some time next week and I'll announce here when it is ready. In the meantime, if you are interested in testing and have a particular region you'd like the pre-release AMI deployed to, please comment below.

LazerGrieves commented 5 months ago

@AndrewGuenther Glad to hear it! If you could include the us-east-2 region, I would be happy to perform some testing.

Thank you for all the work you've done so far.

RaJiska commented 4 months ago

I tested again and it still works on my end. Tried in both a IPv6 and a dual-stack setup. I will update my nat64 branch on the Terraform module to make it easy to test for those who'd like to easily try.

@AndrewGuenther Regarding your request for me to have a look at @nickpetrovic post:

Unfortunately I haven't been able to reproduce the slow DNS for IPv4, could have been because I was using AL2 back then, unsure. Nevertheless, this potential issue should be considered out of scope for this project as it affects the machine that needs to be NATted rather than the NAT instance itself, therefore, up to the users to configure themselves.

AndrewGuenther commented 4 months ago

Ah I didn't realize that issue was not impacting NAT. Nevermind then!

AndrewGuenther commented 4 months ago

I'm planning to deploy the prerelease AMI to the following regions and can add as necessary:

AndrewGuenther commented 4 months ago

Quick update here: I've hit quotas on number of public AMIs allowed. I'm waiting on AWS support to grant increases in these regions and will follow-up asking for increases in other regions since this will block future releases...

AndrewGuenther commented 4 months ago

It's as if AWS support heard my cries. Pre-release AMIS with NAT64 are now available!

Account owner: 568608671756

ARM AMIs: us-east-1: ami-0c2e470170d2a48e3 us-east-2: ami-068b53093f22a6584 us-west-2: ami-0e7c2bc7b3fd2ccaa

x86 AMIs: us-east-1: ami-0328a21a503f457f0 us-east-2: ami-03d634862884cb475 us-west-2: ami-09ef1da1160df1bdf

(Note to self: For these I added a prefix to the AMI names so the documented search patterns wouldn't pick them up, but in the future it might be best to put these in a separate account entirely to avoid any confusion)

RaJiska commented 4 months ago

Just updated the Terraform module to have a functional NAT64 support with the latest fck-nat version. You can easily experiment using the nat64 branch of the repo, start, which would allow one to quickly setup a full env to test.

Make sure to set the ami_id to one of the values from the post above before applying, then start an EC2 in the fck-nat-example-public6 subnet. If you have configured SSM by setting the appropriate role you should be able to SSM into the instance (service which only supports IPv4, and therefore means NAT64 is working).

AndrewGuenther commented 2 months ago

How is this working for everyone? Any issues?