Closed thaala closed 9 months ago
We have been running into issues ever since we updated python from 3.9 to 3.11+. The vip-manager seems to move too quickly and the server sense duplicate IP Addresses on a administrative switchover. We've found that by restarting the vip-manager service on the destination leader, the contention resolves. It is somewhat intermittent, but completely destroys the High-Availability of the ETCD/PGSQL Cluster since we don't know if it will be successful during a true host failure. We've disabled Windows built in Duplicate IP Detection on both Cluster Nodes to see if that would resolve the issue. It became slightly better, Windows stopped popping up regarding duplicate IP Addresses, but we still find that the switchover is unsuccessful 80+% of the time. Any thoughts on how to improve this system or to replace vip-manager with another solution?
Would disabling Gratuitous ARP in the registry settings help? (Although the pros and cons of disabling it may need to be considered separately.)
Registry key: HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters Value name: ArpRetryCount Value type: REG_DWORD Data: 0x0
Under Windows 2022 environment, the same duplicate detection has been improved as described above.
Failovers and switchovers using patronis etcdctl ends up with 80% failure because of duplicate IP in the network. Windows (our case: 3nodesWindows Server 2022, 2nodesPostgres) doesnt reactivate or test the additional IP again after such conflict has been detected.
with an ipconfig /all the standard IP4 status is a (Preferred) state. In case of such error we got a (Duplicate) status instead
The failover stucks until removing this address by powershell command. After removing address vip-manager add it again and failover succeeds.
We found a workaround for the moment. Its forever running task on both postgres servers which looks every 5 seconds for a (duplicate) state of the desired interface and if occurs remove the address....
Powershell: Remove-NetIPAddress -Confirm:$false -InterfaceAlias yourinterfacename -AddressState Duplicate
A better way could be to add such verfiy command short time after adding the IP to the interface instead doing this ever 5 seconds inside the vip-service. If duplicate state happens the ip can be removed and added again until state is (Preferred) or amount of try ends up with a permanent fail...
Thank you for this software. BR Thilo