Open RaJiska opened 1 year ago
If you're seeing the kind of volume which would require these kernel tweaks, you're likely at a point where fck-nat cannot sustain you or NAT Gateway would be more reasonable. Here's my logic on that:
Instances with less than 32vCPUs are limited to 5Gbps internet egress bandwidth[1]. I think it is highly unlikely you would hit these limits in any environment which is using less than 5Gbps.
Instances with over 32vCPUs give you 50% of the advertised bandwidth[1]. The cheapest network optimized instance with 32vCPUs is a c6gn.8xlarge
which maxes out at 25Gbps and costs ~$980 more per month to operate than NAT Gateway. You'd need to have a 21TB egress for that to break even with data transfer. So really this optimization is for people in that boat and if you're in that boat you're likely to want the availability and bandwidth (up to 100Gbps) guarantees that NAT Gateway provides.
I'm not saying I wouldn't accept contributions for this, just wanted to add some color as to why I haven't pursued this already.
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html
The two issues are unrelated. Tuning those variables and how much bandwidth you can push are not related in any way.
I could have a million idle tcp connections, or 1 connection that is maxing out my bandwidth.
Tuning the numbers, adjusted tcp timeout from 12hours to something more reasonable, and increasing the default number of connections the kernel will track are based on memory, not network speed.
I understand they're unrelated, but I'm talking about likely use cases and how I've prioritized work. If you're utilizing a high number of connection you're likely utilizing higher bandwidth. Again, I'm not saying that I wouldn't accept contributions/tackle this work, I'm just giving my reasoning as to why it hasn't been done already and a disclaimer that if you're worried about a large number of connections, you should consider this information about bandwidth as well.
Instances with less than 32vCPUs are limited to 5Gbps internet egress bandwidth[1]. I think it is highly unlikely you would hit these limits in any environment which is using less than 5Gbps.
Thank you for this additional context, I was actually not aware of this limitation of 5Gbps per instance for internet-gateway bound network, it's really sneaky of them.
That said, I have encountered the case where a single of my instances (in a public subnet) would have its conntrack table entirely filled and dropping new connections, while being nowhere near the 5Gbps limitation. In this scenario, a fck-nat instance without kernel tuning would not have been able to sustain the load, and even less if this scenario had other instances.
In this case tuning kernel would really help but would also require more resources, especially in terms of memory, which probably would require at least a t4g.medium
, or even a r7g.medium
, which would have a similar hourly rate as NAT GW (excluding saving plans), but without the extra GB processing fee, which in this case, might be the bulk of the bill.
The intention behind this issue is more to open a discussion on the matter and perhaps establish a comprehensive list of settings that that would cover this case where fck-nat would need to handle a large number of connections without necessarily reaching its bandwidth limit.
you can avoid the 5gbps limit by sharding the public internet ip prefixes via CIDR deaggregation. i.e. multiple fck-nat NATs for a single VPC via route table manipulation.
@philipg To put simply, creating smaller private subnets, each with their own NAT instance? This would work but unfortunately requires changes to the networking layer just to accommodate this technical constraint, which is not ideal.
@RaJiska the other way around. sharding the public internet. multiple routes. so instead of 0.0.0.0/0 you split the internet address space up.
This is a clever trick. Thanks for sharing this idea.
Hi,
I'd like to open a discussion regarding fck-nat used for a production-ready type of load. Currently the way it's configured might not be enough for such a load as I could not see kernel tweaking configuration in scripts. Unfortunately I am no expert in Kernel tweaking and am not aware of all the configurations that might be necessary, but here are a few that I can think of:
conntrack
, the conntrack table once filled might drop new connections:nf_conntrack_max
which governs the maximum number of tracked connections (and optionallynf_conntrack_buckets
for performances)nf_conntrack_tcp_timeout_*
to a lower value than the default perhaps ?tcp_wmem
,tcp_rmem
,udp_wmem
,udp_rmem
which should probably be increased so it can support a higher loadtcp_max_syn_backlog
fs.max-files
which limit could be overflowed if there are too many connectionsPerhaps some more could be added, but it'd be interesting to have different profiles available that might be used depending of the usage intended of fck-nat.