Closed bison closed 7 years ago
I encountered the same issue doing a straight upgrade from 1185.2.0 to 1262.0.0.
Everything worked perfectly except the aforementioned service connectivity issues from within containers (Host machine to service, other machine to service, direct ip from container etc all worked).
Rolling back to 1185.2.0 worked after deleting /var/lib/docker/network/files/local-kv.db
If I didn't screw up the git bisect
, I think this was introduced in torvalds/linux@e3b37f1. That patch seems to have caused a few issues which have since been fixed. The tip of master, 4.10.0-rc2-0f64df3
, is working as expected for me.
Should be fixed by https://github.com/coreos/coreos-overlay/pull/2353.
This is still present in 4.9.3.
It looks like we are still experiencing this issue with the kernel 4.9.9, coreOS alpha 1325.0.0.
/cc @bgilbert
@Raffo If you run
sudo iptables -P FORWARD ACCEPT
in the host, does that fix the issue for you?
I didn't try that and I can, but it looks unrelated. We solved the issue with a change in configuration. Removing the flag "--iptables=false" from the docker setting "fixed" the problems. The effect that we were seeing before was a missing NAT on the response coming from the pod running on the same host.
@Raffo --iptables=false
shouldn't be necessary, especially if you're using a network plugin such as kubenet or CNI. It sounds as though what you're seeing is not the kernel bug reported in this issue, so I'll close. If you believe you're seeing incorrect behavior from Container Linux, please open a new issue.
I agree on closing here.
To clarify, we removed --iptables=false
, which means it's true
by default. With the setting to false
, the networking is broken. I suspect the issue includes also flannel, in which repository do you think I should open it?
@Raffo Whenever you're unsure where an issue belongs, create it here.
Issue Report
Under Kubernetes, pod to pod communication via service IP within a single node is broken in latest CoreOS alpha (1262.0.0). Downgrading to previous alpha resolves the issue. The issue is not specific to Kubernetes however.
Bug
CoreOS Version
Environment
Observed on both bare metal and Vagrant + VirtualBox locally.
Expected Behavior
In Kubernetes, I can define a service which targets a set of pods and makes them reachable via a virtual IP. With kube-proxy running in iptables mode, Kubernetes will configure NAT rules to redirect traffic destined for that virtual IP to the individual pods. That should work for traffic originating from any node in the cluster.
Actual Behavior
With a cluster running on the latest alpha (1262.0.0), pods cannot be reached via their service IP when the traffic originates from another pod on the same node as the destination.
The following does work on the same node:
Reproduction Steps
This isn't actually specific to Kubernetes, I have an alpha-nat branch of coreos-vagrant that will start a VM with a Docker container and iptables rules similar to what Kubernetes uses -- along with trace rules for debugging.
Check out that branch and run either
./start-broken.sh
or./start-working.sh
thenvagrant ssh
into the VM and run the following:You could also use cloud-config in the
user-data
file on that branch on any other platform.Another option is to launch a Kubernetes cluster with a single scheduleable node and start a pod with an accompanying service. Other pods on the same node will not be able to communicate using the service IP. I've been using this for testing.
Other Information
Starting the same
echoheaders
container under rkt with the default ptp networking and configuring similar NAT rules seems to work as expected from other containers, so this might only be happening when attaching containers to a bridge.coreos/coreos-overlay#2300 landed in 1262.0.0 -- I tried not marking the interfaces unmanaged with overrides in
/etc/systemd/network/
but it didn't seem to help.