cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.
https://kube-router.io
Apache License 2.0
2.3k stars 468 forks source link

Provide configuration for routing pod traffic over a specific interface #1128

Closed strideynet closed 3 years ago

strideynet commented 3 years ago

The problem

We are producing a Kubernetes PaaS to sell to customers, and use internally.

Provisioned nodes have two interfaces:

Other influencing factors:

Attempted Solutions thus far

Setting the Node IP to the public interface

The nodes are capable of discovering each other, and exchange routes, however the next-hop is configured as the public IP of the host. Due to the restrictions on our public network (anti ip-spoofing and IP in IP), traffic is not able to pass from pods one node to another.

--override-next-hop doesn't really solve the problem because:

Setting the Node IP to the private interface

The nodes are capable of discovering each other, and exchange routes. Pods on different nodes are able to exchange traffic.

However, several IPTables rules in other elements of kube-router experience issues as they use the Node IP for setting the source of SNAT'ed traffic. This means that pods with hostNetwork true configured are not able to properly connect to ClusterIP and NodePort services located outside of the cluster (e.g kube apiserver), as requests have the source IP of the nodes private interface set.

One example of an iptables rule with this problem is -A POSTROUTING ! -s 10.20.0.0/24 ! -d 10.20.0.0/24 -m ipvs --vdir ORIGINAL --vmethod MASQ -m comment --comment "" -j SNAT --to-source 10.215.0.171 --random-fully. 10.215.0.171 is the node's IP on the private network.

If these IPtables rules were fixed, than most of this would be non-issue. Whilst I would prefer that the Node IP is the public IP address, I'm willing to accept it being the private address if it means pod-pod networking works correctly. Unfortunately, I'm not really sure that they can be fixed, beyond perhaps allowing a configuration option that would let you set the actual public IP of the host to use for this SNAT rule ??

Potential solutions

After a day or so of thinking, I have thought up a few potential solutions that we would be willing to develop for merging into this project. It would be helpful for some correspondence on this issue, so we know which is your preferred route.

Update the peer detection to consider the kube-router.io/bgp-local-addresses of other nodes

Rather than always selecting the Node IP of other nodes, adjust NetworkingRoutingController.syncInternalPeers to prefer addresses specified in kube-router.io/bgp-local-addresses over the Node IP.

With this solution, we would also need to ensure that GoBGP restarts if we detect a change in kube-router.io/bgp-local-addresses. Unfortunately, Kubelet does not provide a way to set annotations on initial registration, and it seems unsuitable to suggest to operators that they need to manually trigger a restart of all kube-router pods if this value needs to change.

It may be possible we'd want to introduce another annotation or command-line parameter to enable this behaviour so existing installs do not change with the introduction of this feature. I'm open to suggestions on naming etc here.

Allow a command-line parameter or environment variable to configure an interface that the pod traffic should travel over

This has been requested previously by multiple users in https://github.com/cloudnativelabs/kube-router/issues/567 and is functionality that exists in Calico (https://docs.projectcalico.org/reference/node/configuration#ip-autodetection-methods).

It would adjust kube-router to accept a command line parameter or environment variable along the lines of --pod-traffic-interface ens11, or alternatively accept an IP address rather than an interface if that is preferred.

This provided interface would need to be resolved to an IP via netlink. This IP would then be announced as the next-hop for pods on the node. Nodes would continue to peer with one another and exchange routes via the public network, but the traffic from/to pods would cross the private network.

I'm not sure this is a particularly niche-need, as I have seen multiple Kubernetes deployments across multiple employers that have made use of two network interfaces. This functionality is supported in other popular tools (e.g Calico) as single parameter/env var configurations. I'd hazard a guess that in some circumstances that operators are more marginally more likely to know the interface they want to use for private traffic than the IP address, and that this interface is likely to be the same across their fleet of nodes.

Conclusion

Many thanks for reading this admittedly heavy issue. I look forward to reaching a solution that works for the most people since I really like the simplicity of kube-router and the focus on using tools that already exist in the Linux ecosystem.

Please let me know if there is a preferred option here, and once we've agreed the finer parts I'll raise a PR.

aauren commented 3 years ago

@murali-reddy do you have any thoughts on this?

aauren commented 3 years ago

Unfortunately, dual-homed use cases are difficult. We've had a lot of issues about them over the lifetime of the project. What makes them even more difficult is that many users have different needs and want competing or niche features implemented into the project which are cost time to review and maintain. Unfortunately, at this time I'm going to close this issue as stale.

strideynet commented 3 years ago

I think if that is going to be the point of view of the maintainers of the project, it might be worth noting it out as a limitation of the project in the README (that dual-homed use cases aren't going to be supported), given this is probably quite an important point of evaluation for people choosing a k8s networking solution.

aauren commented 3 years ago

I'm only speaking from my experience with the project, I don't know if it's solid enough to comment at the top of the project or not. And to be honest, we do already support some dual-homed use cases through the use of the --override-next-hop parameter.

But maybe it would have been more helpful to provide a bit more context in my comment. It would have been more accurate to say that we certainly see a lot of need for more dual-homed use case support. If you look through the project there have probably been 10+ various issues that touch on dual-homed use-cases in some way or another. The problem is that, if you look through them, most of them are trying to do different things or have different niche needs. If we want to handle this issue holistically, it's going to take a lot of time of digging through the history of the project, understanding all of the needs of the users, and then trying to abstract a solution in a way that allows kube-router to solve most of the needs while minimizing the complexity of the interface and keeping or increasing the flexility of kube-router as a Kubernetes network provider (both of which are core values to the project).

So, this is a large problem area that we're aware of and if we take a step we want to make sure that we understand it well enough to make sure that it's not functionality that we'll have to walk back in a later version because it doesn't provide enough support for all of the use cases.

When I saw your request I knew that I didn't have time to look into your request with that level of detail and, being a relatively new maintainer, don't have the history to understand whether this was a good step forward off the top of my head. So I asked Murali for his thoughts and it appears that he doesn't have time either at this point.

So in the mean time, we do have a few open issues for dual-homed use cases, and I don't think that one additional stale open issue is going to be helpful so I closed this specific one. I do like the level of detail that you've added about your use case and I've started labelling some of the dual-homed related issues that I've found (including this one) so that when we get a chance to give it our attention we can begin aggregating the asks and keep this specific issue in mind.