AndrewGuenther / fck-nat

Feasible cost konfigurable NAT: An AWS NAT Instance AMI
https://fck-nat.dev
MIT License
1.33k stars 54 forks source link

Unable to pull ECR image in ECS Fargate task when using fck-nat #99

Closed genghis-tuan closed 1 month ago

genghis-tuan commented 1 month ago

Hi, I was wondering if I'm missing something. When I use AWS ECS Fargate tasks that run in a private subnet, the ECR images can be fetched when I have a NAT gateway running. I want to try switching from AWS NAT gateway to fck-nat, but the tasks never get past the Pending status, and then they fail with:

Task stopped at: 2024-09-19T21:34:58.424Z
ResourceInitializationError: unable to pull secrets or registry auth: The task cannot pull registry auth from Amazon ECR: There is a connection issue between the task and Amazon ECR. Check your task network configuration. RequestError: send request failed caused by: Post "https://api.ecr.us-east-1.amazonaws.com/": dial tcp <SOME_IP>:443: i/o timeout.

When I go to the route table that is associated with the private subnet of the task and replace the ENI of the fck-nat EC2 instance with the NAT gateway, the task successfully starts.

Am I missing something elementary? What I have is a brand new VPC setup with Terraform with 3 public subnets and 3 private subnets. The 3 private subnets are each associated with their own route table. Each of these route tables has a route to the EIN of the fck-nat instance. The VPC has a NAT gateway, but the route to the NAT gateway is only present when I swap out the route to the ENI for debugging.

AndrewGuenther commented 1 month ago

Can you confirm with the Reachability Analyzer that your tasks are able to talk to the NAT?

Does the security group attached to your fck-nat instance allow inbound connections from your VPC?

genghis-tuan commented 1 month ago

Thanks for responding so quickly, Andrew. I found the issue. It was not related to anything in this repo. I apologize for wasting your time. The issue was that during testing of recovery scenarios, I deleted the fck-nat EC2 instance to allow the ASG to recreate the instance. It appears that the official Terraform does not attach the static Elastic Network Interface to the new instance. Therefore the route in the route table points to an existing EIN, but the EIN is not attached. This was the cause of my issue. I need to find a way for the launch template to attach an existing EIN to a new instance. AWS doesn't seem to allow that. I'm closing this issue.

genghis-tuan commented 1 month ago

Sorry, final comment because I'm an idiot. Just realized the the script here adds the static ENI to the instance via userdata. I'm not sure what's going on in AWS, but pretty certain it is just me doing things wrong. Thank you again.