AndrewGuenther / cdk-fck-nat

CDK constructs for the fck-nat service
MIT License
65 stars 10 forks source link

Elastic IP for non-HA and HA #240

Closed m-radzikowski closed 9 months ago

m-radzikowski commented 1 year ago

Hey, multiple people recommended fck-nat on Twitter, and I must say - great job, and having CDK support is a cherry on top.

However, I need a static IP for my NAT Gateway (with Elastic IP). I saw the https://github.com/AndrewGuenther/fck-nat/issues/14 but HA with EIP was not resolved there.

I made it work, but the solution is quite ugly.

No-HA

With a no-HA setup you can easily attach EIP - working example:

const natProvider = new NatInstanceProvider({
    instanceType: InstanceType.of(InstanceClass.T4G, InstanceSize.MICRO),
    machineImage: MachineImage.genericLinux({
        "eu-west-1": "ami-06cc086db04924b69",
    }),
});

const vpc = new Vpc(this, "VPC", {
    vpcName: this.stackName,
    availabilityZones: this.availabilityZones.slice(0, 1),
    subnetConfiguration: [
        {
            name: "public",
            subnetType: SubnetType.PUBLIC,
        },
        {
            name: "private",
            subnetType: SubnetType.PRIVATE_WITH_EGRESS,
        },
    ],
    natGatewayProvider: natProvider,
});

// make sure you have enough EIPs for all availability zones
const eipAllocationIds = ['eipalloc-xxx'];
natProvider.configuredGateways.forEach((gateway, idx) => {
    new CfnEIPAssociation(this, `EIPAssociation${idx}`, {
        allocationId: eipAllocationIds[idx],
        instanceId: gateway.gatewayId,
    });
});

We can test it with this small Lambda:

new lambda.Function(this, 'CheckIP', {
    runtime: Runtime.NODEJS_18_X,
    handler: 'index.handler',
    code: Code.fromInline('exports.handler = async function(event, ctx) { return (await (await fetch("https://api.my-ip.io/ip.json")).json()).ip }'),
    vpc: vpc,
});

The instance has 1 Network Interface attached with EIP assigned. This interface is eth0 and is set as the output in the iptables nat table:

$ iptables -t nat -L -n -v
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
  210 16361 MASQUERADE  all  --  *      eth0    0.0.0.0/0            0.0.0.0/0            /* NAT routing rule installed by fck-nat */

HA

Now, with HA things get complicated.

The FckNatInstanceProvider creates a "permanent" Network Interface that is set as a target for private subnet routing. It also creates an AutoScaling Group. Instance that starts in the ASG gets a "regular" Network Interface on eth0 and then runs aws ec2 attach-network-interface to attach the other, "permanent" Network Interface, as eth1.

Then it sets the NAT routing to use eth0, just like in the no-HA setup.

Now, the simplest solution would be to attach EIP to this "permanent" Network Interface and route the output traffic through it. The code changes slightly:

// FckNatInstanceProvider instead of NatInstanceProvider
const natProvider = new FckNatInstanceProvider({
    machineImage: MachineImage.genericLinux({
        "eu-west-1": "ami-06cc086db04924b69",
    }),
    instanceType: InstanceType.of(InstanceClass.T4G, InstanceSize.MICRO),
});

const vpc = new Vpc(this, "VPC", {
    vpcName: this.stackName,
    availabilityZones: this.availabilityZones.slice(0, 1),
    subnetConfiguration: [
        {
            name: "public",
            subnetType: SubnetType.PUBLIC,
        },
        {
            name: "private",
            subnetType: SubnetType.PRIVATE_WITH_EGRESS,
        },
    ],
    natGatewayProvider: natProvider,
});

// make sure you have enough EIPs for all availability zones
const eipAllocationIds = ['eipalloc-xxx'];
natProvider.configuredGateways.forEach((gateway, idx) => {
    new CfnEIPAssociation(this, `EIPAssociation${idx}`, {
        allocationId: eipAllocationIds[idx],
        networkInterfaceId: gateway.gatewayId, // networkInterfaceId instead of instanceId
    });
});

// with FckNatInstanceProvider, NAT security group does not allow any ingress by default
natProvider.securityGroup.addIngressRule(Peer.ipv4(vpc.vpcCidrBlock), Port.allTraffic());

However, when I try to change NAT table to use eth1, traffic does not flow and requests time out:

$ iptables -t nat -F
$ iptables -t nat -A POSTROUTING -o "eth1" -j MASQUERADE
$ iptables -t nat -L -n -v
Chain POSTROUTING (policy ACCEPT 17 packets, 1206 bytes)
 pkts bytes target     prot opt in     out     source               destination
    2   120 MASQUERADE  all  --  *      eth1    0.0.0.0/0            0.0.0.0/0

And I don't know why. I'm also not good enough with networks to figure it out.

The eth1 interface connects to the internet from the instance - curl --interface eth1 https://api.my-ip.io/ip.json works.

The other solution would be to associate EIP with the eth0 interface with AWS CLI from the instance, and I made it work.

Instead of creating CfnEIPAssociation in the previous code, we do this:

const eipAllocationIds = ['eipalloc-xxx', 'eipalloc-yyy'];

const asg = vpc.publicSubnets[0].node.findChild("FckNatAsg") as unknown as AutoScalingGroup;
asg.addToRolePolicy(new PolicyStatement({
    actions: ['ec2:AssociateAddress'],
    resources: [
        ...eipAllocationIds.map(eip => `arn:aws:ec2:${Aws.REGION}:${Aws.ACCOUNT_ID}:elastic-ip/${eip}`),
        `arn:aws:ec2:${Aws.REGION}:${Aws.ACCOUNT_ID}:network-interface/*`,
    ],
}));

asg.addUserData(
    `aws_region="$(/opt/aws/bin/ec2-metadata -z | cut -f2 -d' ' | sed 's/.$//')"`,
    `eth0_mac="$(cat /sys/class/net/eth0/address)"`,
    `token="$(curl -X PUT -H 'X-aws-ec2-metadata-token-ttl-seconds: 300' http://169.254.169.254/latest/api/token)"`,
    `eth0_eni_id="$(curl -H "X-aws-ec2-metadata-token: $token" http://169.254.169.254/latest/meta-data/network/interfaces/macs/$eth0_mac/interface-id)"`,
    `allocation_id=$(aws ec2 describe-addresses --region "$aws_region" --filters "Name=allocation-id,Values=${eipAllocationIds.join(',')}" --query 'Addresses[?!AssociationId] | [0].AllocationId' --output text)`,
    `if [ "$allocation_id" = "None" ]; then echo "Free EIP not found"; else aws ec2 associate-address --region "$aws_region" --allocation-id "$allocation_id" --network-interface-id "$eth0_eni_id"; fi`,
);

Ugly, but it works. We provide a list of EIPs and the script finds the first free one and associates it.

The script in more readable version:

aws_region="$(/opt/aws/bin/ec2-metadata -z | cut -f2 -d' ' | sed 's/.$//')"
eth0_mac="$(cat /sys/class/net/eth0/address)"

token="$(curl -X PUT -H 'X-aws-ec2-metadata-token-ttl-seconds: 300' http://169.254.169.254/latest/api/token)"
eth0_eni_id="$(curl -H "X-aws-ec2-metadata-token: $token" http://169.254.169.254/latest/meta-data/network/interfaces/macs/$eth0_mac/interface-id)"

allocation_id=$(aws ec2 describe-addresses --region "$aws_region" --filters "Name=allocation-id,Values=eipalloc-xxx,eipalloc-yyy" --query 'Addresses[?!AssociationId] | [0].AllocationId' --output text)

if [ "$allocation_id" = "None" ]; then
  echo "Free EIP not found"
else
  aws ec2 associate-address --region "$aws_region" --allocation-id "$allocation_id" --network-interface-id "$eth0_eni_id"
fi

So those are solutions to set up fck-nat with Elastic IP, with no-HA and HA version.

If you want, I can create a PR to add those examples to the docs. I think a few people would be interested.

The other thing - I would really want to make it work with EIP associated through the IaC, but the problem is with routing through eth1. Are you able to help with that?

AndrewGuenther commented 1 year ago

Just want to ack that this is on my radar and I'll get back to you when some other projects die down.

AndrewGuenther commented 1 year ago

EIP configuration support has landed in fck-nat as part of https://github.com/AndrewGuenther/fck-nat/pull/44

That change will get released as part of the 1.3 release at the end of the month.

You can pretty easily now add to the user data this value and it will work for both HA and non-HA. Leaving this open because in the next release I'd like to add more direct support in the construct to pass the EIP id.

nguyendon commented 10 months ago

@AndrewGuenther any idea when 1.3 might be released? EIP config would be great for our use case.

AndrewGuenther commented 10 months ago

@nguyendon I've updated the release timeline for 1.3 to end of this month. Things got a bit crazy with the holiday and I'm also changing jobs so I haven't had much time to get 1.3 out the door. I've got the next two weeks off though and 1.3 is definitely one of my priorities during that time.

nguyendon commented 10 months ago

Thanks for the update and congrats on the new gig 🎉