Open philhug opened 6 years ago
Possibly related to #2443
I'm not entirely sure this is related to that MTU-from-DHCP bug, the original report seems to mention node-local connectivity problems.
@philhug can you please provide:
journalctl -u systemd-networkd
docker version
I think this is likely related to https://groups.google.com/forum/#!topic/coreos-user/FSqBD-R_PPI, i.e. a missing modprobe br_netfilter
.
Hello @lucab,
Unfortunately that issue is still going.
docker version
:
https://gist.github.com/lucazz/1585a51845c1eb465827a18c6b70030e
Here's some context (you've requested @dghubble on his issue):
journalctl -u systemd-networkd
:
https://gist.github.com/lucazz/9f7e1feb41be26075cb9596bded4466f
networkctl status -a
:
https://gist.github.com/lucazz/477f92f2c39d2ec8bf503c9412a04f7e
In this case, containers are able to talk to each other but not w/ the web:
core@ip-10-33-29-37 ~ $ docker run alpine ping -c 5 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
core@ip-10-33-29-37 ~ $
@lucazz from your logs it looks like your setup also entails flannel and calico. If that is the case, it's a bit of a complex setup to debug (and perhaps unrelated to the original report). A reproducer on a cleaner node would be helpful, because I suspect your issue is due to either your specific configuration (e.g. security groups, iptables or calico policy) or in one of those higher level components.
Heya @lucab,
The only reason that leads me to believe that that's not the issue is that in this same cluster, with this same canal configs, we have a couple of AWS Ubuntu Deep Learning instances in the cluster that work just fine: https://gist.github.com/lucazz/606c6e3bfe8c4bfe5475e7729214884c
Ubuntu's docker version: https://gist.github.com/lucazz/87745dc3b0451decd14df3ef98146c8d
Ubuntu's networkctl status -a: https://gist.github.com/lucazz/cc1aa38386b2b9ccf23c77082f7333d2
Ubuntu's journalctl -u systemd-networkd: https://gist.github.com/lucazz/d19e3161b7b46080a717f7499393e6ab
Issue Report
Bug
Since updating from 1688.5.3 to 1745.4.0 we experienced connectivity issues between docker containers and also from the toolbox. After rollback to 1688.5.3 the problem disappeared.
$ cat /etc/os-release (back on working NAME="Container Linux by CoreOS" ID=coreos VERSION=1688.5.3 VERSION_ID=1688.5.3 BUILD_ID=2018-04-03-0547 PRETTY_NAME="Container Linux by CoreOS 1688.5.3 (Rhyolite)" ANSI_COLOR="38;5;75" HOME_URL="https://coreos.com/" BUG_REPORT_URL="https://issues.coreos.com" COREOS_BOARD="amd64-usr"