Closed panpan0000 closed 5 years ago
when any k8s resources get updated(svc,pod,configmap)
It shouldn't be triggered on every svc, pod, configmap change in kubernetes. It should only be if specifically the keepalived configmap changes or any svc/endpoint that is related to a vip in that configmap changes. Are you seeing something different?
I'm also digging the rational for Cleanup() here, And also hope you gurus can share a quick hint here.
The intent was to cleanup VIPs on startup to fix duplicate VIPs issue since keepalived didn't do it properly itself. Since then, there were other changes that cleaning up in reload doesn't make sense anymore. Now that we have the health check I think we can actually remove the cleanup in reload. The health check now will trigger a shutdown if not MASTER and has a VIP, and shutdown calls Cleanup()
to remove the duplicate VIP.
E0516 16:03:39.525379 6 main.go:464] Health check unsuccessful: BACKUP should not contain VIP 10.0.2.17
I0516 16:03:39.887737 6 main.go:325] Received SIGTERM, shutting down
I0516 16:03:39.887786 6 main.go:343] shutting down controller queues
I0516 16:03:39.887808 6 keepalived.go:252] Cleanup: [10.0.2.17]
I0516 16:03:39.887846 6 keepalived.go:274] removing configured VIP 10.0.2.17
I0516 16:03:40.061462 6 main.go:333] Exiting with 0
Thu May 16 16:03:40 2019: Stopping
I can do a PR for this.
Hey, guys, I'm back ...
In short:
when any k8s resources get updated(svc,pod,configmap), I observe the MASTER keepalived instance will switch to BACKUP. (version = 0.33), because the "Cleanup()" in every reload logic.
(1) who will suffer ?
During this transition, if at this moment, the keep-alive connections will suffer from connection lost. (my test: running
ab -c 200 -n 10000 -k http://$VIP/
( the-k
means enable keep-alive), during the MASTER/BACKUP transition, a few seconds, the client will failapr_socket_recv: Connection reset by peer (104)
)(2) suspicion:
I found
Cleanup()
is involved inReload()
, it seems unreasonable , because it will remove the VIP from the NIC, and cause master/backup switching.I'm also digging the rational for Cleanup() here, And also hope you gurus can share a quick hint here.
(3) detail logs
the master log as below: