Failover of a node takes 23 seconds (sometimes 40) for the service to be reachable again

Configured ks8-keep-alived daemonset and my deployment/service.

[root@bcmt-01-control-01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE ddeio-6b4df765b4-6wgj7 1/1 Running 0 5s 192.168.1.84 bcmt-01-worker-01 ddeio-6b4df765b4-bwk92 1/1 Running 0 5s 192.168.1.26 bcmt-01-worker-02 kube-keepalived-vip-bsmsv 1/1 Running 15 1d 172.16.1.11 bcmt-01-worker-01 kube-keepalived-vip-dkfd2 1/1 Running 16 1d 172.16.1.12 bcmt-01-worker-02

[root@bcmt-01-control-01 ~]# kubectl get svc -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR dde-ext-io NodePort 10.254.9.232 3868:3868/TCP 2m app=ddeio

The service is available via the 10.75.117.72 VIP. [root@bcmt-01-edge-01 ~]# netstat -anc | grep -w 3868 tcp 0 524 10.75.117.81:50842 10.75.117.72:3868 ESTABLISHED tcp6 0 0 :::3868 :::* LISTEN

Out of the two worker nodes, the IP is plumbed on the worker-01 _[root@bcmt-01-worker-01 ~]# ip a ..... 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:02:02:93 brd ff:ff:ff:ff:ff:ff inet 172.16.1.11/24 brd 172.16.1.255 scope global dynamic eth1 valid_lft 83687sec preferred_lft 83687sec inet 10.75.117.72/32 scope global eth1 valid_lft forever preferredlft forever

After this, during an existing load session where traffic is being pumped, if I reboot the worker-01, the VIP gets plumbed on the other node in 2-3 seconds, but when the client tries to connect again, it's unable to do that unless the service becomes available after 23 secs and then the traffic resumes.

_Wed 2 Jan 07:15:14 UTC 2019 tcp 0 584 10.75.117.81:52702 10.75.117.72:3868 ESTABLISHED Wed 2 Jan 07:15:15 UTC 2019 tcp 0 1109 10.75.117.81:52702 10.75.117.72:3868 FIN_WAIT1 - This is at the time of reboot of worker-01._

Then.... _Wed 2 Jan 07:15:17 UTC 2019 tcp 0 1 10.75.117.81:52700 10.75.117.72:3868 SYN_SENT tcp 0 1109 10.75.117.81:52702 10.75.117.72:3868 FINWAIT1 ..... and so on .....

Finally, _Wed 2 Jan 07:15:38 UTC 2019 tcp 0 1 10.75.117.81:52692 10.75.117.72:3868 SYN_SENT tcp 0 1 10.75.117.81:52700 10.75.117.72:3868 SYN_SENT tcp 0 0 10.75.117.81:52690 10.75.117.72:3868 ESTABLISHED tcp 0 1 10.75.117.81:52696 10.75.117.72:3868 SYN_SENT tcp 0 1 10.75.117.81:52694 10.75.117.72:3868 SYN_SENT tcp 0 1 10.75.117.81:52698 10.75.117.72:3868 SYN_SENT tcp 0 1109 10.75.117.81:52702 10.75.117.72:3868 FINWAIT1

So, it shows that it took 23 seconds for the connection to be established finally. In certain cases I have seen that number to be even 30-40 seconds.

Since my service is exposed via NodePort and the IP movement also happened within 2-3 seconds, should not the service be available immediately after the IP movement since one of the app POD on the leftover worker node can serve traffic. I tried this test numerous times to see if I am missing something. Finally, I tried https://github.com/munnerz/keepalived-cloud-provider where we use LoadBalancer rather than NodePort, and I could get the service reach-ability within 4 seconds after a node reboots. But, alas we can use the LB concept only in BM and not on cloud.

Can you please check this? Such a high failover time on node reboot certainly won't fit the bill.

PS: Even when exposing the service via ExternalIP, I see the same behaviour. PS: When I kill one of the TWO app PODs, rather than node reboot, the service is almost instantly available (since it did not have to plumb the IP).

aledbf / kube-keepalived-vip

Failover of a node takes 23 seconds (sometimes 40) for the service to be reachable again #78