containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.21k stars 784 forks source link

portmap plugin's iptables rules intercept kubernetes service traffic #222

Closed danwinship closed 2 years ago

danwinship commented 5 years ago

The portmap code assumes that --dst-type LOCAL will only match traffic addressed to the node's own IP addresses. However, at least under some configurations, it will end up matching traffic to addresses in the kubernetes service range as well. As a result, if you have, eg, a pod that claims hostport 443, it might end up receiving traffic that was sent to 172.30.0.1:443 by pods on the same node.

As seen in https://github.com/kubernetes/kubernetes/issues/66103. It's not clear if this depends in some way on how the network plugin sets up routing to the service network, or if it happens all the time for any plugin that uses the portmap plugin.

squeed commented 5 years ago

--dst-type LOCAL exactly translates to "is there an entry in the local routing table." 99% of the time, this means there's an interface with that address assigned to it. However, nothing stops you from adding more entries to the local table. I know that the crazy hackers at GCP do it for their cluster IPs.

The challenge is to figure out an iptables condition that matches only the exact traffic we want. That's why I added the conditionsV4 parameter, so the admin could tweak exactly what traffic gets mapped. If anyone comes up with a safe condition to add to the default, we can definitely make that change. So far, we haven't found one that works.

danwinship commented 5 years ago

--dst-type LOCAL exactly translates to "is there an entry in the local routing table."

Hm... it doesn't seem like the service CIDR route should be local... (OpenShift SDN's route-to-serviceCIDR is unicast.) So maybe this is a Calico bug?

I know that the crazy hackers at GCP do it for their cluster IPs.

I guess that depends on whether you consider those to be alternative local IPs or ExternalIPs...

The challenge is to figure out an iptables condition that matches only the exact traffic we want.

I think what you want is "is addressed to the IP address of any network interface on the host" but if --dst-type LOCAL doesn't do that, I'm not sure what would

dghubble commented 5 years ago

maybe this is a Calico bug

In my testing, I notice after enabling kube-proxy IPVS mode, when a pod with a hostport (i.e. portmap) runs on a node, it prevents service IP access (from both the host node and any pods on it). I've reproduced this with both Calico and Flannel (both of which use portmap) and on different clouds, so my inclination was its unlikely in the CNI provider itself.

squeed commented 5 years ago

Aha, IPVS - that's something I hadn't considered. I wonder if we're racing on rules.

I suspect we'll either need to write a separate k8s-portmap plugin that is more opinionated, so we can hook in the right place, or just add some sort of k8s mode to the existing plugin.

danwinship commented 5 years ago

Well, but then you're just potentially broken on every other platform that uses CNI besides kubernetes...

Ideally, the portmap plugin (and each other iptables user) would intercept only the packets it actually wanted, and then ordering wouldn't matter.

Alternatively, maybe something like my KUBE-BEFORE-POSTROUTING, etc, suggestion on the sig-net list should be implemented at the CNI level rather than the k8s level; the actual chains would be more or less the same, but it would be a requirement of CNI that the environment set them up for plugins to use, rather than being a k8s-specific thing.

dannyk81 commented 5 years ago

@danwinship any word if there was any progress with this issue? 🙏

danwinship commented 5 years ago

No. (I'm not working on this, I was just reporting the bug after having reviewed a kubernetes PR that was trying to fix the problem in the wrong way.)

Presumably if there was progress there would be updates here or in the linked kube bug, so you can subscribe to those.

liqlin2015 commented 5 years ago

@squeed Do you know there is any plan to fix this issue?

dannyk81 commented 5 years ago

ping @squeed :smile: