We have ~3 nodes in each AZ in region eu-central-1. All nodes, besides one of them can connect to each other fine and have no problems. We see one node failing to connect cross AZ to POD network, but EC2 network works cross AZ from the failing node.
Expected Behavior
ICMP Ping and TCP connections to POD ip to another AZ should work
Current Behavior
ICMP Ping and TCP connections to POD ip to another AZ did not work
Possible Solution
restart flanneld fixed the problem
Steps to Reproduce (for bugs)
unknown
Context
We run production workloads in Kubernetes using flannel and have 18 production clusters of different sizes.
Your Environment
Flannel version: quay.io/coreos/flannel:v0.7.1
Backend used (e.g. vxlan or udp): vxlan
Etcd version: quay.io/coreos/etcd:v3.1.6
Kubernetes version (if used): v1.7.5+coreos.0
Operating System and version:
$ cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=1465.7.0
DISTRIB_CODENAME="Ladybug"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 1465.7.0 (Ladybug)"
failing node:~#~ # ping 10.2.109.4
PING 10.2.109.4 (10.2.109.4) 56(84) bytes of data.
^C
working node:~ # ping 10.2.109.4
PING 10.2.109.4 (10.2.109.4) 56(84) bytes of data.
64 bytes from 10.2.109.4: icmp_seq=1 ttl=63 time=0.686 ms
^C
flannel config in etcd for the target
working node:~ # etcdctl get /coreos.com/network/subnets/10.2.109.0-24
{"PublicIP":"172.31.1.190","BackendType":"vxlan","BackendData":{"VtepMAC":"42:88:9e:f2:82:cb"}}
failing node:~ # etcdctl get /coreos.com/network/subnets/10.2.109.0-24
{"PublicIP":"172.31.1.190","BackendType":"vxlan","BackendData":{"VtepMAC":"42:88:9e:f2:82:cb"}}
$ journalctl -u flanneld
Sep 12 08:53:20 ip-172-31-15-202.eu-central-1.compute.internal flannel-wrapper[972]: I0912 08:53:20.388592 972 network.go:243] L3 miss but route for 10.2.53.0 not found
Sep 12 08:53:21 ip-172-31-15-202.eu-central-1.compute.internal flannel-wrapper[972]: I0912 08:53:21.412813 972 network.go:243] L3 miss but route for 10.2.53.0 not found
Sep 12 08:53:26 ip-172-31-15-202.eu-central-1.compute.internal flannel-wrapper[972]: I0912 08:53:26.148887 972 network.go:243] L3 miss but route for 10.2.53.0 not found
Sep 12 08:53:27 ip-172-31-15-202.eu-central-1.compute.internal flannel-wrapper[972]: I0912 08:53:27.172774 972 network.go:243] L3 miss but route for 10.2.53.0 not found
We have ~3 nodes in each AZ in region eu-central-1. All nodes, besides one of them can connect to each other fine and have no problems. We see one node failing to connect cross AZ to POD network, but EC2 network works cross AZ from the failing node.
Expected Behavior
ICMP Ping and TCP connections to POD ip to another AZ should work
Current Behavior
ICMP Ping and TCP connections to POD ip to another AZ did not work
Possible Solution
restart flanneld fixed the problem
Steps to Reproduce (for bugs)
unknown
Context
We run production workloads in Kubernetes using flannel and have 18 production clusters of different sizes.
Your Environment
Flannel version: quay.io/coreos/flannel:v0.7.1
Backend used (e.g. vxlan or udp): vxlan
Etcd version: quay.io/coreos/etcd:v3.1.6
Kubernetes version (if used): v1.7.5+coreos.0
Operating System and version:
$ cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS" DISTRIB_RELEASE=1465.7.0 DISTRIB_CODENAME="Ladybug" DISTRIB_DESCRIPTION="Container Linux by CoreOS 1465.7.0 (Ladybug)"
Link to your project (optional): https://github.com/zalando-incubator/kubernetes-on-aws/
Network investigation
ping node to POD IP
flannel config in etcd for the target
local ARP table
flanneld logs shows: