acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.01k stars 736 forks source link

keepalived issue with haproxy #2485

Closed adnanhamdussalam closed 2 weeks ago

adnanhamdussalam commented 1 month ago

Hi,

Servers: test1, test2

I have already configured keepalived against two ha proxy servers I am able to move the VIP to other server when haproxy services goes down on one server (test1). Now when the haproxy service is running on server test2 and VIP is also on server test2 and then I start the service of keepalived on test1 but its priority is low and then when I shutdown the haproxy service on server test2 due to low priority on server test1 the keepalived does not move back the VIP to server test1.

Any idea or possibility to do it?

pqarmitage commented 1 month ago

You will need to provide copies of your keepalived configurations, and also any track_scripts that you are using. Then we can have a look at it.

adnanhamdussalam commented 1 month ago

PFB the configuration settings:

server 1:

[root@testbed06 postgres]# cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -20 # Apply this weight if HAProxy is down on this node fall 2 rise 2

If exit code is 1 (this node's HAProxy is down)

}

vrrp_script chk_both_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -50 # Apply this weight if both nodes' HAProxy services are down fall 2 rise 2

If exit code is 2 (both nodes' HAProxy are down)

}

vrrp_instance VI_1 { state MASTER # Set this node as MASTER interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must be the same on both nodes) priority 101 # Priority (higher number means higher priority) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match on both nodes) } virtual_ipaddress { 10.114.16.72 # Virtual IP address (VIP) } track_script { chk_haproxy_down chk_both_haproxy_down }

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt

} [root@testbed06 postgres]# cat "/etc/keepalived/chk_haproxy_advanced.sh"

!/bin/bash

Define the path to Keepalived's control socket or state file (if applicable)

KEEPALIVED_VRRP_INSTANCE="VI_1"

Local HAProxy status check

if killall -0 haproxy >/dev/null 2>&1; then

If HAProxy is running on this node, ensure full priority

echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100  # Example: Set priority back to full
exit 0

else

HAProxy is down on this node, check the other node's HAProxy status

ssh postgres@10.114.16.64 "killall -0 haproxy >/dev/null 2>&1"
sleep 10
if [ $? -ne 0 ]; then
    # Both HAProxy services are down, reduce priority drastically

    echo "Both HAProxy services are down. Reducing priority drastically."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50  # Example: Reduce priority significantly
    exit 2
else
    # Only this node's HAProxy is down, reduce priority moderately
    echo "HAProxy is down on this node. Reducing priority moderately."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80  # Example: Reduce priority moderately
    exit 1
fi

fi

server 2:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -20 # Apply this weight if HAProxy is down on this node fall 2 rise 2

If exit code is 1 (this node's HAProxy is down)

}

vrrp_script chk_both_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -50 # Apply this weight if both nodes' HAProxy services are down fall 2 rise 2

If exit code is 2 (both nodes' HAProxy are down)

}

vrrp_script chk_master_haproxy {

script "ssh postgres@10.114.16.50 'killall -0 haproxy' || echo 1"

interval 5

weight 10

}

vrrp_instance VI_1 { state BACKUP # Set this node as BACKUP interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must match the MASTER node) priority 100 # Priority (lower than MASTER) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match the MASTER) } virtual_ipaddress { 10.114.16.72 # Same VIP as the MASTER node } track_script { chk_haproxy_down

chk_both_haproxy_down

}

notify_master /etc/keepalived/start_haproxy.sh notify_backup /etc/keepalived/stop_haproxy.sh preempt } [postgres@testbed09-1664 ~]$ cat /etc/keepalived/chk_haproxy_advanced.sh

!/bin/bash

Define the path to Keepalived's control socket or state file (if applicable)

KEEPALIVED_VRRP_INSTANCE="VI_1"

Local HAProxy status check

if killall -0 haproxy >/dev/null 2>&1; then

If HAProxy is running on this node, ensure full priority

echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100  # Example: Set priority back to full
exit 0

else

HAProxy is down on this node, check the other node's HAProxy status

ssh postgres@10.114.16.50 "killall -0 haproxy >/dev/null 2>&1"

sleep 10 if [ $? -ne 0 ]; then

Both HAProxy services are down, reduce priority drastically

    echo "Both HAProxy services are down. Reducing priority drastically."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50  # Example: Reduce priority significantly
    exit 2
else
    # Only this node's HAProxy is down, reduce priority moderately
    echo "HAProxy is down on this node. Reducing priority moderately."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80  # Example: Reduce priority moderately
    exit 1
fi

fi

pqarmitage commented 3 weeks ago

There appear to be a number of issues:

  1. You have added sleep 10 after the ssh postgres@ ... command in the chk_haproxy_advanced.sh script. The exit code of sleep will be 0, and so the else block will always be executed, and the result of ssh postgres@... will always be ignored.
  2. I don't know what ip vrf VI_1 100 (or 80 or 50) are expected to do, unless you have commands 100 80 and 50. Have you created a vrf named VI_1?
  3. vrrp_scripts chk_haproxy_down and chk_both_haproxy_down both call shell script /etc/keepalived/chk_haproxy_advanced.sh, and so the vrrp_scripts will either both be up or both be down.
  4. /etc/keepalived/chk_haproxy_advanced.sh can exit with exit codes of 0, 1 or 2. keepalived just checks whether the exit code is 0 or not 0, so keepalived will not be aware of any difference between an exit code of 1 or 2, although currently your script will never exit with exit code 2 (see point 1. above).

I don't know why keepalived is not taking over as master on test1 when keepalived is stopped on test2, but I suggest you correct the issues identified above first, and if you are still experiencing the original problem you will need to post the full keepalived logs from both systems. Also, if you execute kill -USR1 $(cat /var/run/keepalived.pid) when test1 has not taken over as master keepalived will produce a file /tmp/keepalived.data, and it would be helpful if you posted that as well.

adnanhamdussalam commented 3 weeks ago

I have changed the setting now :

Master:

[postgres@testbed06 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 # Check every 2 seconds weight -10 # Reduce priority by 10 if the script fails }

vrrp_instance VI_1 { state MASTER # Set this node as MASTER interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must be the same on both nodes) priority 101 # Priority (higher number means higher priority) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match on both nodes) } virtual_ipaddress { 10.114.16.72 # Virtual IP address (VIP) } track_script { chk_haproxy }

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt

}

BAckup:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 # Check every 2 seconds weight -2 # Reduce priority by 10 if the script fails }

vrrp_instance VI_1 { state BACKUP # Set this node as BACKUP interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must match the MASTER node) priority 100 # Priority (lower than MASTER) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match the MASTER) } virtual_ipaddress { 10.114.16.72 # Same VIP as the MASTER node } track_script { chk_haproxy }

notify_master /etc/keepalived/start_haproxy.sh notify_backup /etc/keepalived/stop_haproxy.sh }

when I shutdown the haproxy on testbed09 the service do not get to testbed06 because priority is 91 on it How can I control this issue ?

PFb the logs of both and I unable to find the /tmp/keepalived

master log output :

[postgres@testbed09-1664 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Tue 2024-10-29 12:31:42 EDT; 2min 3s ago Main PID: 2761615 (keepalived) Tasks: 2 (limit: 201936) Memory: 1.9M CPU: 1.006s CGroup: /system.slice/keepalived.service ├─2761615 /usr/sbin/keepalived --dont-fork -D └─2761616 /usr/sbin/keepalived --dont-fork -D

Oct 29 12:32:32 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 98 to 100 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: Script chk_haproxy now returning 1 Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: VRRP_Script(chk_haproxy) failed (exited with status 1) Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 100 to 98

backup log :

[postgres@testbed06 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Mon 2024-10-28 06:37:58 EDT; 1 day 5h ago Main PID: 352793 (keepalived) Tasks: 2 (limit: 98870) Memory: 2.0M CPU: 8min 26.411s CGroup: /system.slice/keepalived.service ├─352793 /usr/sbin/keepalived --dont-fork -D └─352794 /usr/sbin/keepalived --dont-fork -D

Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: Script chk_haproxy now returning 1 Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: VRRP_Script(chk_haproxy) failed (exited with status 1) Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: (VI_1) Changing effective priority from 101 to 91 Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Master received advert from 10.114.16.64 with higher priority 98, ours 91 Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Entering BACKUP STATE Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) removing VIPs.

pqarmitage commented 3 weeks ago

The file produced by kill -USR1 ... is /tmp/keepalived.data, not /tmp/keepalived.

I think your problem is that when a VRRP instance is in the backup state, you stop ha_proxy, and only start it again once the VRRP instance transitions to master state.

So when keepalived is running on both testbed06 and testbed09, the VRRP instances start in backup mode (and haproxy is not running), so testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2). testbed06 has higher priority and so becomes VRRP master, haproxy is started, and the VRRP instance priority increases to 101.

You then stop haproxy on testbed06, and so the VRRP priority reduces to 99, but this is still higher than testbed09, and so testbed06 remains the VRRP master.

I think you need to not have keepalived starting and stopping haproxy, and then it should work.

adnanhamdussalam commented 3 weeks ago

Thank you for the update I have removed the starting and stopping haproxy in keepalived but still having the same issue because when priority get low on backup keepalived as it is testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2) after stopping the service on testbed06 the priority becomes 99 and on testbed09 has already priority 98 so the VIP do not switch.

I think failover (HA) for haproxy is not possible with keepalived as per my best knowledge and after performing many tests.

Kindly can you share your expert opinion on the above test case.

pqarmitage commented 3 weeks ago

Based on the configuration you have provided above, your statement that testbed06 has priority 99 and testbed09 has priority 98 means that haproxy is not running on either system. You need to ensure that haproxy is permanently running on both systems, i.e. enable the haproxy service using systemctl.

There appear to be quite a few websites that describe how to use keepalived with haproxy, such as: https://medium.com/@kemalozz/installation-of-haproxy-and-keepalived-for-high-availability-f1d6e7b8982a https://sysadmins.co.za/achieving-high-availability-with-haproxy-and-keepalived-building-a-redundant-load-balancer/ https://www.digitalocean.com/community/tutorials/how-to-set-up-highly-available-haproxy-servers-with-keepalived-and-reserved-ips-on-ubuntu-14-04 https://docs.vmware.com/en/vRealize-Operations/8.10/vrops-manager-load-balancing/GUID-EC001888-776B-42D5-9843-719EF08AB940.html https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/load_balancer_administration/s2-lvs-keepalived-haproxy-vsa

and these should give you some guidance on what you need to do.

adnanhamdussalam commented 3 weeks ago

Thank you for the update. It is not possible to running haproxy on both servers because the haproxy is using VIP and haproxy service will up from the server where the VIP resides. When I try to start the service of haproxy on the server where VIP does not reside the haproxy service simple error out and exited error the service.

pqarmitage commented 3 weeks ago

In the sample configurations I have seen don't specify an IP address to bind to, e.g.:

frontend my_frontend
  bind *:80
  default_backend my_backend

Doing it that way, you should not have a problem with the VIP not being present.

adnanhamdussalam commented 3 weeks ago

Thank you for the update. After changing the bind to * now rather VIP is available or not psql using IP connect to postgresql.

I still think HA is not possible for haproxy using keepalived.

Any idea ?

PFB the output :

Backup testbed09:

Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Master received advert from 10.114.16.50 with higher priority 102, ours 101 Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Entering BACKUP STATE Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) removing VIPs.

[postgres@testbed09-1664 keepalived]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

mydb=#

Master testbed06:

Oct 31 06:57:16 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 [postgres@testbed06 haproxy]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

mydb=#

pqarmitage commented 3 weeks ago

I think you probably need to post your haproxy configuration in order for us to be able to comment any further. I can then try and reproduce the problem.

It is worth trying to access postgresql from a third machine, and not testbed06 or testbed09, in the first instance. There can be complications when trying to forward connections when they originate on the same system as is doing the forwarding, although I don't know if that applies to haproxy. If you test it this way, then you can use tcpdump or wireshark to see what is happening to the packets and identify where the problem lies. You might find it works better if you use

Given the number of artlcles on the web that describe how to use haproxy and keepalived together, I think it is very unlikely that HA is not possible for haproxy using keepalived.

pqarmitage commented 3 weeks ago

I've just seen that I didn't finish the second paragraph. I intended to suggest that you add use_vmac to the vrrp_instance block. That would have the advantage that the MAC address associated with the VIP does not change when the backup takes over as master. However, it does mean that the backup instance would not be able to communicate with the VIP on the other system, since the advertised MAC address for the VIP would be locally configured on the backup.

adnanhamdussalam commented 3 weeks ago

Thank you for the update.

I will try it my third system and will share the results in a while. PFB my current settings: MASTER:

[postgres@testbed06 ~]$ cat /etc/keepalived/keepalived.conf global_defs {

script_user root
enable_script_security

} vrrp_script chk_haproxy { script "killall -0 haproxy" # widely used idiom interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface enp1s0 state MASTER priority 100 virtual_router_id 51 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 10.114.16.72/24 } unicast_src_ip 10.114.16.50 # This node unicast_peer { 10.114.16.64 # Other nodes } track_script { chk_haproxy } notify_master /etc/keepalived/start_haproxy.sh }

BACKUP:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security } vrrp_script chk_haproxy { script "killall -0 haproxy" # widely used idiom interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface enp1s0 state BACKUP priority 99 virtual_router_id 51 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 10.114.16.72/24 } unicast_src_ip 10.114.16.64 # This node unicast_peer { 10.114.16.50 # Other nodes } track_script { chk_haproxy } notify_master /etc/keepalived/start_haproxy.sh } [postgres@testbed09-1664 ~]$

adnanhamdussalam commented 3 weeks ago

I have tried from the third system it accessible but still facing the priority issue i have stopped the haproxy service on master but it did not switched the VIP PFB the details :

Master output :

[postgres@testbed06 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Fri 2024-11-01 06:58:00 EDT; 8min ago Main PID: 178716 (keepalived) Tasks: 2 (limit: 98870) Memory: 1.9M CPU: 2.373s CGroup: /system.slice/keepalived.service ├─178716 /usr/sbin/keepalived --dont-fork -D └─178717 /usr/sbin/keepalived --dont-fork -D

Nov 01 06:58:04 testbed06 Keepalived_vrrp[178717]: (VI_1) Changing effective priority from 100 to 102 Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 07:00:12 testbed06 Keepalived_vrrp[178717]: Script chk_haproxy now returning 1 Nov 01 07:00:12 testbed06 Keepalived_vrrp[178717]: VRRP_Script(chk_haproxy) failed (exited with status 1) Nov 01 07:00:12 testbed06 Keepalived_vrrp[178717]: (VI_1) Changing effective priority from 102 to 100

BACKUP Output:

[postgres@testbed09-1664 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Fri 2024-11-01 06:43:39 EDT; 23min ago Main PID: 3396580 (keepalived) Tasks: 2 (limit: 201936) Memory: 1.9M CPU: 254ms CGroup: /system.slice/keepalived.service ├─3396580 /usr/sbin/keepalived --dont-fork -D └─3396582 /usr/sbin/keepalived --dont-fork -D

Nov 01 06:57:18 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Nov 01 06:58:04 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) Master received advert from 10.114.16.50 with higher priority 100, ours 99 Nov 01 06:58:04 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) Entering BACKUP STATE Nov 01 06:58:04 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) removing VIPs. [postgres@testbed09-1664 ~]$ systemctl status haproxy ● haproxy.service - HAProxy Load Balancer Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; preset: disabled) Active: active (running) since Fri 2024-11-01 06:40:12 EDT; 27min ago Process: 3396049 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 3396052 (haproxy) Status: "Ready." Tasks: 17 (limit: 201936) Memory: 20.1M CPU: 121ms CGroup: /system.slice/haproxy.service ├─3396052 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid └─3396054 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Nov 01 06:40:12 testbed09-1664 systemd[1]: Starting HAProxy Load Balancer... Nov 01 06:40:12 testbed09-1664 haproxy[3396052]: [NOTICE] (3396052) : New worker (3396054) forked Nov 01 06:40:12 testbed09-1664 haproxy[3396052]: [NOTICE] (3396052) : Loading success. Nov 01 06:40:12 testbed09-1664 systemd[1]: Started HAProxy Load Balancer.

pqarmitage commented 3 weeks ago

What the logs show is that at 06:58:04 testbed09 had priority 99 and at that time testbed06 had priority 100. At the same time, testbed06 increased its priority from 100 to 102 (so presumably the track_script detected that haproxy had started). At 07:00:12 chk_haproxy started returning 1 on testbed06, and so it reduced its priority to 100. This was still higher than the priority on testbed09 (99), and so testbed06 remained as master.

It would appear that on testbed09 for some reason the track_script chk_haproxy is not correctly seeing that haproxy is running and is therefore returning a non-zero exit code, causing the priority to remain at 99. I suggest you post the full keepalived logs on both systems from the time that keepalived started up so that we can see what is actually happening. Just seeing the last few lines of the log entries from the output of systemctl status keepalived is really not sufficient.

adnanhamdussalam commented 2 weeks ago

Thank you for the update.

After below settings now it is working fine, the VIP is switching successfully but I am facing another issue now from third system (testbed13) I am connecting with all three IP's (node IP's and VIP) I think it is due to bind *:5000.

How can I control this as I do not want the client application to connect via any other IP than VIP only.

PFB the testing output and keepalived settings:

[postgres@testbed13 ~]$ psql -h 10.114.16.64 -p 5000 psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

postgres=# SELECT inet_server_addr() AS hostname; hostname

10.114.16.68 (1 row)

postgres=# \q [postgres@testbed13 ~]$ psql -h 10.114.16.50 -p 5000 psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

postgres=# SELECT inet_server_addr() AS hostname; hostname

10.114.16.70 (1 row)

postgres=# \q [postgres@testbed13 ~]$ psql -h 10.114.16.72 -p 5000 psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

postgres=# SELECT inet_server_addr() AS hostname; hostname

10.114.16.70 (1 row)

postgres=# \q [postgres@testbed13 ~]$

Keepalived setting:

MASTER:

[root@testbed06 postgres]# cat /etc/keepalived/keepalived.conf global_defs {

router_id testbed06

script_user root
enable_script_security

} vrrp_script chk_haproxy { script "killall -0 haproxy" # widely used idiom interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface enp1s0 state MASTER priority 100 virtual_router_id 51 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 10.114.16.72/24 } unicast_src_ip 10.114.16.50 # This node unicast_peer { 10.114.16.64 # Other nodes } track_script { chk_haproxy } notify_master /etc/keepalived/start_haproxy.sh }

[root@testbed06 postgres]# cat /etc/keepalived/start_haproxy.sh

!/bin/bash

Script to start HAProxy when Keepalived transitions to MASTER state

systemctl start haproxy

BACKUP:

[root@testbed09-1664 ~]# cat /etc/keepalived/keepalived.conf global_defs {

router_id testbed09-1664

script_user root
enable_script_security

} vrrp_script chk_haproxy { script "killall -0 haproxy" # widely used idiom interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface enp1s0 state BACKUP priority 99 virtual_router_id 51 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 10.114.16.72/24 } unicast_src_ip 10.114.16.64 # This node unicast_peer { 10.114.16.50 # Other nodes } track_script { chk_haproxy } notify_master /etc/keepalived/start_haproxy.sh }

[root@testbed09-1664 ~]# cat /etc/keepalived/start_haproxy.sh

!/bin/bash

Script to start HAProxy when Keepalived transitions to MASTER state

systemctl start haproxy

pqarmitage commented 2 weeks ago

I'm glad that you have now got haproxy and keepalived working together.

How can I control this as I do not want the client application to connect via any other IP than VIP only.

[postgres@testbed13 ~]$ psql -h 10.114.16.64 -p 5000

You are connecting to 10.114.16.64 which is the VIP.

postgres=# SELECT inet_server_addr() AS hostname; hostname

10.114.16.70

The postgresql documentation states that inet_server_addr returns the IP address on which the server accepted the current connection, so that is the address on the postgresql server. The address that the postgresql server is seeing the connection coming from is the address that haproxy uses when it makes the connection from testbed06/09 to the postgresql server. The only place to configure that would be in the haproxy configuration.

adnanhamdussalam commented 2 weeks ago

HI, The VIP is 10.114.16.72 and testbed06 physical IP is 10.114.16.50 and testbed09 physical IP is 10.114.16.64. I want only VIP to be get connected instead of physical IP's when I configure the bind 10.114.16.72:5000 it only allows the VIP IP to connect but as you said we require the haproxy services must be up on both servers for this we have changed it to bind *:5000 so is it possible to control that on VIP connects instead of physical IP's from the third server.