Route table gone forever after network interface link down and up

tanapoln commented 6 years ago

I just found a weird behavior of kube-router that cause an outage in my cluster.

There was an event (not confirmed by AWS) that cause an host interface link down and then up. After that event, all nodes inside cluster (including master node) lost a route table and it cause all pods communication to be failed.

You can easily reproduce this by provision a cluster, deploy some application and then just run if down eth0 && if up eth0 with root account in any node.

After that, check route table ip route and all routes gone. To resolve that I have to manually restart kube-router and wait a couple minute to see route table come back again.

Please help. Thanks

Environment: Cloud: AWS Kubernetes: 1.9.3 OS: Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 Docker: 17.3.2 Kube-router: 0.1.0

murali-reddy commented 6 years ago

@tanapoln kube-router does periodic sync which syncs/set up the node to be in desired state (right configuration of ipvs, routing, iptables, ipset etc). Additionally to handle this kind of scenarios, kube-router also fails liveliness checks so I guess kube-router pods should have been restarted. Did you notice things were still broken after long duration of outage?

This is fatal condition, I wonder how else could kube-router can be made to be more resilient to this kind of scenarios.

tanapoln commented 6 years ago

@murali-reddy Route table is still broken after an hour, a manual pod restart is required.

As for ipvs configurations were correct but all pods on the node cannot talk to pod on another machine due to lack of route table.

roffe commented 6 years ago

what happened is that kernel flushes ephermal stuff on interface down, when it came up it's blanked. Kube-router does sync every 5 minutes by default. lowering the sync period for the controllers to 1 minute would fix the problem with the underlying network interface going away faster.

@murali-reddy i think there is some new functions in the netlink library that allows us to watch interfaces. we could implement a interface watcher for the dummy interface, the main interface & the bridge interface and trigger according sync() if we detect it going away and coming back

murali-reddy commented 6 years ago

Ok, let's see how much of pain it is :) to add that support (given the netlink issues we have seen)

serbaut commented 5 years ago

I have this issue too. CoreOS 1911.3.0, k8s 1.12.2, kube-router 0.2.1. It is easily reproduced by restarting the systemd-networkd service. Routes are flushed and never come back.

The actual root cause in production was a restart of the dhcp server so our node lost its dhcp lease and that caused a flush of the routes.

The only way I can see routes being updated is via BGP updates in watchBgpUpdates()

https://github.com/cloudnativelabs/kube-router/blob/cf9bf47d521eeab1ae41fdc7d52a89be9dbcf820/pkg/controllers/routing/network_routes_controller.go#L334

However GoBGP only notifies watchers for changed routes via GetChanges() so the periodic update of routes is not visible to kube-router:

https://github.com/cloudnativelabs/kube-router/blob/cf9bf47d521eeab1ae41fdc7d52a89be9dbcf820/vendor/github.com/osrg/gobgp/table/destination.go#L569-L573

Has this worked before? Maybe GoBGP previously notified watches of all updates not just changes?

ticpu commented 5 years ago

We have the same issue where Ubuntu unattended upgrades did a reexec on systemd and restarted systemd-networkd after an update. We were left with broken kube-routers which still advertise the IPs on the dummy-if. Restarting the kube-router will put the routes back and restarting a neighbour will also add the route for that neighbour.

This made me wonder if we could do like BIRD but better, BIRD will scan the routing table for changes and add back missing routes, however, it would be even better to be event based. It is also possible to listen for kernel events on route changes (like ip monitor/rtmon) and react on events on routes using proto 17 and if it is a route that GoBGP still has in its local BGP Route Info, we can simply add the route back.

Would that be a viable options to add to GoBGP ?

icefed commented 5 years ago

Any updates on this?

roffe commented 5 years ago

@icefed not at the moment, If you could offer a hand that would be awesome

essh commented 5 years ago

I also recently saw the same issue regarding Ubuntu unattended upgrades of systemd as referenced in https://github.com/cloudnativelabs/kube-router/issues/509#issuecomment-441092229. I had to restart kube-router to get the routes back.

mattlqx commented 5 years ago

I hate to "me too" this but same deal as above. unattended-upgrades upgraded systemd and freaked out the clusters. The unfortunate part about this case, is that there aren't really any error messages to alarm on besides pods going into crash loops.

I see references above that the routing table gets dropped, where exactly does one see this happening? The routing table on the system looks fine and the routes viewed from gobgp look fine. Is there a concrete way of identifying the routes (wherever they are) are out-of-sync.

lomkju commented 5 years ago

@essh you can do this to solve it. https://github.com/cloudnativelabs/kube-router/issues/370#issuecomment-463967949

mattlqx commented 4 years ago

Had this event happen again today because of systemd upgrade. The best way I've come up with to auto-correct for this is to look at the routing table and determine if the advertised routes for the CNI blocks of other hosts are present. There should be one for each BGP neighbor.

I'll admit this is a little naive of a check, but it should be enough for kube-router to restart itself when it doesn't have a route for every neighbor.

My full livenessProbe looks like this as it checks the health endpoint, number of routes from neighbors and verifies it has a subnet assigned from CNI:

        livenessProbe:
          exec:
            command:
              - sh
              - -c
              - >-
                reply=$(curl -s -o /dev/null -w %{http_code} http://127.0.0.1:20244/healthz);
                if [ "$reply" -lt 200 -o "$reply" -ge 400 ];
                  then exit 1;
                fi;
                if [ "$(netstat -rn |grep U |egrep '^172.29.' | wc -l)" -lt "$(gobgp neigh | grep Establ | wc -l)" ];
                  then exit 1;
                fi;
                egrep -q '172\.29' /etc/cni/net.d/10-kuberouter.conf
          initialDelaySeconds: 60
          periodSeconds: 5
          failureThreshold: 3

Rusox89 commented 3 years ago

I've made a PR (#1151) that separates the synchronization of routes on the host from the event watching from MonitorTable, so during if unattended-upgrades clears the routing table, it will be repopulated within 15s as a default

cloudnativelabs / kube-router

Route table gone forever after network interface link down and up #509