Closed sebastiangrill closed 7 months ago
Hello. Sorry for delay with answer.
Yes, that is the design. We do not change anything if etcd is down. If etcd id down there are probably more serious problems with cluster than VIP.
Yes, that is the design. We do not change anything if etcd is down. If etcd id down there are probably more serious problems with cluster than VIP.
For perspective: on a production cluster of ours etcd fails to start every now and then after reboot. I have not been able to debug it (yet). It's certainly preferable when all components of a system are robust and resilient as opposed to systems that fall appart when one part fails.
Hey,
when the VIP Manager loses the connection to etcd it doesn't unassign the VIP on the machine. In case of network partitioning this means there could be 2 Nodes which try to assign the VIP.
I think this is due to the etcd client Watch which, according to the Documentation, just endlessly tries to reconnect without returning an error. https://github.com/cybertec-postgresql/vip-manager/blob/master/checker/etcd_leader_checker.go#L88C19-L88C24