VIP still assigned when etcd is down

cybertec-postgresql / vip-manager

Manages a virtual IP based on state kept in etcd or Consul

BSD 2-Clause "Simplified" License

207 stars 41 forks source link

VIP still assigned when etcd is down #205

Closed sebastiangrill closed 7 months ago

sebastiangrill commented 9 months ago

Hey,

when the VIP Manager loses the connection to etcd it doesn't unassign the VIP on the machine. In case of network partitioning this means there could be 2 Nodes which try to assign the VIP.

I think this is due to the etcd client Watch which, according to the Documentation, just endlessly tries to reconnect without returning an error. https://github.com/cybertec-postgresql/vip-manager/blob/master/checker/etcd_leader_checker.go#L88C19-L88C24

pashagolub commented 7 months ago

Hello. Sorry for delay with answer.

Yes, that is the design. We do not change anything if etcd is down. If etcd id down there are probably more serious problems with cluster than VIP.

tpo commented 7 months ago

Yes, that is the design. We do not change anything if etcd is down. If etcd id down there are probably more serious problems with cluster than VIP.

For perspective: on a production cluster of ours etcd fails to start every now and then after reboot. I have not been able to debug it (yet). It's certainly preferable when all components of a system are robust and resilient as opposed to systems that fall appart when one part fails.