Closed nmeyerhans closed 6 years ago
The unreachability of the credentials endpoint was previously reported as #1146
@nmeyerhans do you have a preference for one of the options you've listed? i'm not familiar with the known side effects of each one (if any).
The 169.254.172.0/22 route should be added to the ecs-eth0 interface in the task. I think the expectation when adding an address with a /22 CIDR length to an interface is that the corresponding /22 route is automatically added. Indeed that's what happens when you add such an address to an interface using the ip(8) command:
admin@ip-10-0-0-60:~$ ip addr show dev vtapfoo
9: vtapfoo@eth0: <BROADCAST,MULTICAST> mtu 9001 qdisc noop state DOWN group default qlen 500
link/ether d6:38:0a:30:71:fc brd ff:ff:ff:ff:ff:ff
admin@ip-10-0-0-60:~$ ip ro
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.60
169.254.172.0/22 dev ecs-bridge proto kernel scope link src 169.254.172.1 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
admin@ip-10-0-0-60:~$ sudo ip addr add 172.18.0.0/24 dev vtapfoo
admin@ip-10-0-0-60:~$ sudo ip link set vtapfoo up
admin@ip-10-0-0-60:~$ ip ro
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.60
169.254.172.0/22 dev ecs-bridge proto kernel scope link src 169.254.172.1 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.18.0.0/24 dev vtapfoo proto kernel scope link src 172.18.0.0
Note the new route table entry in the last line.
@nmeyerhans thanks for this detailed report! I'm glad that we have a root cause here. Some follow up questions/comments.
Configure a static (never expiring) cache entry for all awsvpc tasks on the host.
awsvpc
task that's launched on the instance? Something like this (169.254.172.2
is the task's link local IPv4 address in this example):$ sudo arp -s -i ecs-bridge 169.254.172.2 0a:58:a9:fe:ac:1a
$ arp
Address HWtype HWaddress Flags Mask Iface
169.254.172.2 ether 0a:58:a9:fe:ac:1a CM ecs-bridge
Disable rp_filter on ecs-eth0 in the task
The 169.254.172.0/22 route should be added to the ecs-eth0 interface in the task.
ecs-bridge
bridge will now be able to discover and communicate with each other and we wanted to avoid that as the intention of the ecs-eth0
interface within a task is to only let containers/tasks communicate with the ECS agent.Do you mean an entry in the host's namespace for each awsvpc task that's launched on the instance? Something like this (169.254.172.2 is the task's link local IPv4 address in this example):
yes
Disable rp_filter on ecs-eth0 in the task Preventing spoofing will be one reason to not do this, yeah?
yes
The 169.254.172.0/22 route should be added to the ecs-eth0 interface in the task.
We chose to not do that as that'd mean all containers/tasks connected to the ecs-bridge bridge will now be able to discover and communicate with each other and we wanted to avoid that as the intention of the ecs-eth0 interface within a task is to only let containers/tasks communicate with the ECS agent.
I suppose it doesn't need to be a whole /22 route. A /32 should work as well. If we don't want communication to happen, though, we should probably consider ebtables or iptables instead, rather than relying on not-entire-obvious routing behavior.
This has been fixed in the aws/amazon-ecs-cni-plugins 2018.02.0 which is included in the agent v1.17.2.
There's an issue in the network configuration applied to the task network namespace in awsvpc mode that can, in certain circumstances, result in the credentials endpoint being unreachable because of an inability of the host to resolve the task namespace's IP address to a MAC address.
Task NS has route table:
and interface ecs-eth0:
Host NS has route table:
and interface ecs-bridge:
In the case where things work, version 1, nothing in ARP caches:
In the case where things work, version 2, task and host ARP caches have entries for the other end:
In the case where things don't work:
This happens if, for some reason, the ARP cache entry for the task's bridge IP expires from the host's ARP cache but the task's cache still has an entry for the host's IP. The only way the host can learn the MAC address of the task's bridge interface is passively, based on incoming ARP queries coming from the task. The host can never successfully query for the task's MAC address.
With no way to resolve its MAC address, the host is unable to send traffic to the task, and the task's query will experience increased latency, possibly to the point of timing out. When the task's ARP cache entry times out, and it needs to resolve the host's MAC address again, the situation will recover. However, the 60 second ARP cache timeout is more than long enough for a client to consider the connectivity problem fatal.
This is a bug in our network configuration in awsvpc mode. We can consider a few options for fixing it: