Support primary/secondary failover mode using conntrackd and keepalived

AndrewGuenther commented 9 months ago

The goal here is to use a primary/secondary configuration to improve fck-nat availability without the need for outside infrastructure. It's really important to me that fck-nat does not take on additional AWS service dependencies. The ideal state is that two autoscaling groups would allow for primary and secondary nodes to self-heal and as long as both nodes aren't offline at the same time failover would be seamless.

Here's a few articles documenting the approach in a non-AWS environment:

Here's the rub: In AWS, we can't just willy nilly change our IP address. We have to explicitly move ENIs and EIPs. This takes some time. So the question becomes: In this configuration, can we move these resources fast enough to not incur downtime (packet loss is fine, downtime == dropped connections)?

This is going to take a lot of testing, but this is my ideal HA configuration for fck-nat 2.0 if it works.

patrickdk77 commented 9 months ago

I've been running conntrackd + keepalived/pacemaker for firewall and then an ipvs pair with pacemaker as my failover for firewall/loadbalacing for 16 years now, it works well, just remember not to use multicast for things. I haven't had much of a problem moving eip between systems in a useful time, though I haven't timed moving eni's

AndrewGuenther commented 9 months ago

@patrickdk77 Yeah, I've got pretty good confidence this will work. I ran plenty of conntrackd+keepalived in ye olden days, but I'm not sure how well it'll translate to AWS networking.

AndrewGuenther / fck-nat

Support primary/secondary failover mode using conntrackd and keepalived #71