Closed danwinship closed 2 years ago
--dst-type LOCAL
exactly translates to "is there an entry in the local
routing table." 99% of the time, this means there's an interface with that address assigned to it. However, nothing stops you from adding more entries to the local
table. I know that the crazy hackers at GCP do it for their cluster IPs.
The challenge is to figure out an iptables condition that matches only the exact traffic we want. That's why I added the conditionsV4
parameter, so the admin could tweak exactly what traffic gets mapped. If anyone comes up with a safe condition to add to the default, we can definitely make that change. So far, we haven't found one that works.
--dst-type LOCAL
exactly translates to "is there an entry in thelocal
routing table."
Hm... it doesn't seem like the service CIDR route should be local
... (OpenShift SDN's route-to-serviceCIDR is unicast
.) So maybe this is a Calico bug?
I know that the crazy hackers at GCP do it for their cluster IPs.
I guess that depends on whether you consider those to be alternative local IPs or ExternalIP
s...
The challenge is to figure out an iptables condition that matches only the exact traffic we want.
I think what you want is "is addressed to the IP address of any network interface on the host" but if --dst-type LOCAL
doesn't do that, I'm not sure what would
maybe this is a Calico bug
In my testing, I notice after enabling kube-proxy IPVS mode, when a pod with a hostport (i.e. portmap) runs on a node, it prevents service IP access (from both the host node and any pods on it). I've reproduced this with both Calico and Flannel (both of which use portmap
) and on different clouds, so my inclination was its unlikely in the CNI provider itself.
Aha, IPVS - that's something I hadn't considered. I wonder if we're racing on rules.
I suspect we'll either need to write a separate k8s-portmap
plugin that is more opinionated, so we can hook in the right place, or just add some sort of k8s mode to the existing plugin.
Well, but then you're just potentially broken on every other platform that uses CNI besides kubernetes...
Ideally, the portmap plugin (and each other iptables user) would intercept only the packets it actually wanted, and then ordering wouldn't matter.
Alternatively, maybe something like my KUBE-BEFORE-POSTROUTING
, etc, suggestion on the sig-net list should be implemented at the CNI level rather than the k8s level; the actual chains would be more or less the same, but it would be a requirement of CNI that the environment set them up for plugins to use, rather than being a k8s-specific thing.
@danwinship any word if there was any progress with this issue? 🙏
No. (I'm not working on this, I was just reporting the bug after having reviewed a kubernetes PR that was trying to fix the problem in the wrong way.)
Presumably if there was progress there would be updates here or in the linked kube bug, so you can subscribe to those.
@squeed Do you know there is any plan to fix this issue?
ping @squeed :smile:
The portmap code assumes that
--dst-type LOCAL
will only match traffic addressed to the node's own IP addresses. However, at least under some configurations, it will end up matching traffic to addresses in the kubernetes service range as well. As a result, if you have, eg, a pod that claims hostport 443, it might end up receiving traffic that was sent to 172.30.0.1:443 by pods on the same node.As seen in https://github.com/kubernetes/kubernetes/issues/66103. It's not clear if this depends in some way on how the network plugin sets up routing to the service network, or if it happens all the time for any plugin that uses the portmap plugin.