AndrewGuenther / fck-nat

Feasible cost konfigurable NAT: An AWS NAT Instance AMI
https://fck-nat.dev
MIT License
1.35k stars 55 forks source link

Two ENIs? #36

Closed madeupname closed 1 year ago

madeupname commented 1 year ago

Thanks for this project, I'm excited to get it to work. My goal is Internet access for a Lambda on a private subnet. I will share all my config and troubleshooting below, but the big thing I notice in contrast to the docs is that my fck-nat EC2 instance has two network interfaces. Is that normal?

The first is FckNatInterface created by the CFN stack. The second appears to have been created automatically when the instance was launched by FckNatAsgLaunchConfig. They have all the same settings (other than private IPv4 address), including security group (NatSecurityGroup) and subnet (public), but FckNatInterface is missing a public IP address. The other one has it.

I note the FckNatInterface is attached by the service script, and given the Enable auto-assign public IPv4 address setting on the public subnet, my guess is that AWS is creating an ENI with a public IP and attaching it before startup.

If I'm reading /opt/fck-nat/fck-nat.sh correctly, a possible solution is:

At that point, the ASG should launch a new instance, which if it follows previous behavior, will attach an ENI with a public IP, which the service script will use. But the bigger question is why is this happening if it's not what you intended? And, of course, will it solve the problem? I confess it's been a while since I've worked professionally as a UNIX/Linux admin.

I should add that I never had a NAT gateway, as this is a project in development, but I assumed I could start with fck-nat directly. If there is some magic when adding a NAT gateway first, I can do that.

Thanks!

Troubleshooting notes:

madeupname commented 1 year ago

I decided to test my theory above. I made the changes and refreshed the instance in the ASG. I got an instance with a single ENI with a public IP, in the correct security group and subnet, but I did need to manually disable the source/destination check. Updated the private subnet route table to point to the new instance.

Here's the thing - it didn't work. I spent an inordinate amount of time troubleshooting and rechecking everything.

The fix? I made a small change to my Lambda function to perform DNS resolution in order to test that. It appears the act of redeploying the function... jiggered something? And it just started working. I even verified there were no other resource changes, it was all the same. Given similar comments in my search, the whole thing (AWS/Lambda) seems a bit flaky.

That said, I still don't know why/how I was getting 2 ENIs. Curious to know.

AndrewGuenther commented 1 year ago

There are two ENIs because you have one for the internal IP address, where the traffic from your VPC goes and then the external IP address where your requests then go out to the internet from.

The behavior you saw is likely because you changed your route table to point to your new NAT instance, but resources still had the old route cached. This is why having that static internal IP is so important, fck-nat ensures by using a consistent ENI for the internal IP address that once your default route is configured it doesn't need to be updated, even if you bring up a new NAT instance.