Closed rkojedzinszky closed 6 months ago
Ahh... This makes more sense. I take it that this was what was going on behind #1664 after looking into it more?
Basically, the problem here is that kube-router at its core does not support overloading services VIPs with multiple service definitions. If it worked in the past versions of kube-router that was by happen stance and not by design, and I'm sure that it broke in other scenarios that maybe didn't pertain to you.
There are multiple boundary cases in the code that going down this path won't work correctly, so rather than patching this one, I think its just better to say that kube-router doesn't support this use-case. The best path forward for you will be to create separate VIPs per service configuration.
@aauren Thanks for the response. Unfortunately, we have built our infrastructure on this behavior. The strange thing for me is that when multiple services exist on the same VIP, and one of them is not ready, only the maxseg
rules are missing, others are present well. Couldn't it be easily supported?
Our use case is to be conservative on public IPv4 addresses, and open multiple ports on that IP, with different services and pods.
Until kube-router supports it, that may be a workaround if I myself place a global maxseg
rule in some different nft table, without defining ports explicitly. That should work. Also, this could be done by kube-router also, until a more sophisticated solution is implemented.
Anyway, thanks for the great project.
@rkojedzinszky - Thanks for the feedback. If you find that there is a simple patch to kube-router that fixes this situation for you and doesn't impose too much change on the kube-router source code you're welcome to float it in a PR and we'll take a look. However, I wouldn't be able to guarantee that this use-case would always be provided for in kube-router even if we're able to carry a patch this time.
It would probably be better to look into other ways that you might be able to solve for this use-case outside kube-router or in addition to kube-router. The global maxseg rule may not be a bad alternative for you in the short-term.
@aauren Does your statement kube-router at its core does not support overloading services VIPs
corresponds to DSR services only, or also to normal (not DSR) services with same externalIP? So is that still by accident that I have correct nft+ipvs rules set up for different services with same VIPs, and just maxseg
rules are missing?
@aauren Oh, I checked nft rules more carefully. I see that the maxseg
rules are created for the IP address without matching port specifications. What if the maxseg
rule too would contain the port specification, and then it could be added when fwmark rules are added? Of course, that would result in more nft rules.
@aauren Oh, I checked nft rules more carefully. I see that the
maxseg
rules are created for the IP address without matching port specifications. What if themaxseg
rule too would contain the port specification, and then it could be added when fwmark rules are added? Of course, that would result in more nft rules.
Does this look reasonable as a quick fix?
diff --git a/pkg/controllers/proxy/network_services_controller.go b/pkg/controllers/proxy/network_services_controller.go
index bcc77990..f69c8233 100644
--- a/pkg/controllers/proxy/network_services_controller.go
+++ b/pkg/controllers/proxy/network_services_controller.go
@@ -1831,7 +1831,8 @@ func setupMangleTableRule(ip string, protocol string, port string, fwmark string
}
// setup iptables rule TCPMSS for DSR mode to fix mtu problem
- mtuArgs := []string{"-d", ip, "-m", tcpProtocol, "-p", tcpProtocol, "--tcp-flags", "SYN,RST", "SYN", "-j", "TCPMSS",
+ if protocol == tcpProtocol {
+ mtuArgs := []string{"-d", ip, "-m", tcpProtocol, "-p", tcpProtocol, "--dport", port, "--tcp-flags", "SYN,RST", "SYN", "-j", "TCPMSS",
"--set-mss", strconv.Itoa(tcpMSS)}
err = iptablesCmdHandler.AppendUnique("mangle", "PREROUTING", mtuArgs...)
if err != nil {
@@ -1842,6 +1843,7 @@ func setupMangleTableRule(ip string, protocol string, port string, fwmark string
if err != nil {
return errors.New("Failed to run iptables command to set up TCPMSS due to " + err.Error())
}
+ }
return nil
}
@@ -1876,7 +1878,8 @@ func (ln *linuxNetworking) cleanupMangleTableRule(ip string, protocol string, po
}
// cleanup iptables rule TCPMSS
- mtuArgs := []string{"-d", ip, "-m", tcpProtocol, "-p", tcpProtocol, "--tcp-flags", "SYN,RST", "SYN", "-j", "TCPMSS",
+ if protocol == tcpProtocol {
+ mtuArgs := []string{"-d", ip, "-m", tcpProtocol, "-p", tcpProtocol, "--dport", port, "--tcp-flags", "SYN,RST", "SYN", "-j", "TCPMSS",
"--set-mss", strconv.Itoa(tcpMSS)}
exists, err = iptablesCmdHandler.Exists("mangle", "PREROUTING", mtuArgs...)
if err != nil {
@@ -1901,6 +1904,7 @@ func (ln *linuxNetworking) cleanupMangleTableRule(ip string, protocol string, po
return errors.New("Failed to cleanup iptables command to set up TCPMSS due to " + err.Error())
}
}
+ }
return nil
}
What happened? Have multiple services with the same externalIP, set up DSR for them. When one of the services are not ready, maxseg rules in nft for other services are missing. ref: https://github.com/cloudnativelabs/kube-router/issues/1664
What did you expect to happen? Expected to work as in v1.6.
How can we reproduce the behavior you experienced? Steps to reproduce the behavior:
With default MTU of 1500, 40 (ip+tcp header) and an extra 20 (ipip) bytes are substracted, thus 1440 is seen in initial SYN packet.
Create a non-ready service, e.g.:
Observe incorrect behavior, tcpdump on node with pod:
You can see incorrect mss in initial SYN packet.
nft rules also reflect this, as there are no maxseg rules for
ip daddr 192.168.9.88 tcp dport 8080
:System Information (please complete the following information):
Kube-Router Version (
kube-router --version
): v2.1.1Kube-Router Parameters:
Kubernetes Version (
kubectl version
) : v1.29.4Cloud Type: bare metal
Kubernetes Deployment Type: kubeadm
Kube-Router Deployment Type: daemonset
Cluster Size: 10 nodes