acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
3.99k stars 735 forks source link

keepalived issue with haproxy #2485

Open adnanhamdussalam opened 1 week ago

adnanhamdussalam commented 1 week ago

Hi,

Servers: test1, test2

I have already configured keepalived against two ha proxy servers I am able to move the VIP to other server when haproxy services goes down on one server (test1). Now when the haproxy service is running on server test2 and VIP is also on server test2 and then I start the service of keepalived on test1 but its priority is low and then when I shutdown the haproxy service on server test2 due to low priority on server test1 the keepalived does not move back the VIP to server test1.

Any idea or possibility to do it?

pqarmitage commented 1 week ago

You will need to provide copies of your keepalived configurations, and also any track_scripts that you are using. Then we can have a look at it.

adnanhamdussalam commented 1 week ago

PFB the configuration settings:

server 1:

[root@testbed06 postgres]# cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -20 # Apply this weight if HAProxy is down on this node fall 2 rise 2

If exit code is 1 (this node's HAProxy is down)

}

vrrp_script chk_both_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -50 # Apply this weight if both nodes' HAProxy services are down fall 2 rise 2

If exit code is 2 (both nodes' HAProxy are down)

}

vrrp_instance VI_1 { state MASTER # Set this node as MASTER interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must be the same on both nodes) priority 101 # Priority (higher number means higher priority) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match on both nodes) } virtual_ipaddress { 10.114.16.72 # Virtual IP address (VIP) } track_script { chk_haproxy_down chk_both_haproxy_down }

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt

} [root@testbed06 postgres]# cat "/etc/keepalived/chk_haproxy_advanced.sh"

!/bin/bash

Define the path to Keepalived's control socket or state file (if applicable)

KEEPALIVED_VRRP_INSTANCE="VI_1"

Local HAProxy status check

if killall -0 haproxy >/dev/null 2>&1; then

If HAProxy is running on this node, ensure full priority

echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100  # Example: Set priority back to full
exit 0

else

HAProxy is down on this node, check the other node's HAProxy status

ssh postgres@10.114.16.64 "killall -0 haproxy >/dev/null 2>&1"
sleep 10
if [ $? -ne 0 ]; then
    # Both HAProxy services are down, reduce priority drastically

    echo "Both HAProxy services are down. Reducing priority drastically."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50  # Example: Reduce priority significantly
    exit 2
else
    # Only this node's HAProxy is down, reduce priority moderately
    echo "HAProxy is down on this node. Reducing priority moderately."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80  # Example: Reduce priority moderately
    exit 1
fi

fi

server 2:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -20 # Apply this weight if HAProxy is down on this node fall 2 rise 2

If exit code is 1 (this node's HAProxy is down)

}

vrrp_script chk_both_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -50 # Apply this weight if both nodes' HAProxy services are down fall 2 rise 2

If exit code is 2 (both nodes' HAProxy are down)

}

vrrp_script chk_master_haproxy {

script "ssh postgres@10.114.16.50 'killall -0 haproxy' || echo 1"

interval 5

weight 10

}

vrrp_instance VI_1 { state BACKUP # Set this node as BACKUP interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must match the MASTER node) priority 100 # Priority (lower than MASTER) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match the MASTER) } virtual_ipaddress { 10.114.16.72 # Same VIP as the MASTER node } track_script { chk_haproxy_down

chk_both_haproxy_down

}

notify_master /etc/keepalived/start_haproxy.sh notify_backup /etc/keepalived/stop_haproxy.sh preempt } [postgres@testbed09-1664 ~]$ cat /etc/keepalived/chk_haproxy_advanced.sh

!/bin/bash

Define the path to Keepalived's control socket or state file (if applicable)

KEEPALIVED_VRRP_INSTANCE="VI_1"

Local HAProxy status check

if killall -0 haproxy >/dev/null 2>&1; then

If HAProxy is running on this node, ensure full priority

echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100  # Example: Set priority back to full
exit 0

else

HAProxy is down on this node, check the other node's HAProxy status

ssh postgres@10.114.16.50 "killall -0 haproxy >/dev/null 2>&1"

sleep 10 if [ $? -ne 0 ]; then

Both HAProxy services are down, reduce priority drastically

    echo "Both HAProxy services are down. Reducing priority drastically."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50  # Example: Reduce priority significantly
    exit 2
else
    # Only this node's HAProxy is down, reduce priority moderately
    echo "HAProxy is down on this node. Reducing priority moderately."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80  # Example: Reduce priority moderately
    exit 1
fi

fi

pqarmitage commented 5 days ago

There appear to be a number of issues:

  1. You have added sleep 10 after the ssh postgres@ ... command in the chk_haproxy_advanced.sh script. The exit code of sleep will be 0, and so the else block will always be executed, and the result of ssh postgres@... will always be ignored.
  2. I don't know what ip vrf VI_1 100 (or 80 or 50) are expected to do, unless you have commands 100 80 and 50. Have you created a vrf named VI_1?
  3. vrrp_scripts chk_haproxy_down and chk_both_haproxy_down both call shell script /etc/keepalived/chk_haproxy_advanced.sh, and so the vrrp_scripts will either both be up or both be down.
  4. /etc/keepalived/chk_haproxy_advanced.sh can exit with exit codes of 0, 1 or 2. keepalived just checks whether the exit code is 0 or not 0, so keepalived will not be aware of any difference between an exit code of 1 or 2, although currently your script will never exit with exit code 2 (see point 1. above).

I don't know why keepalived is not taking over as master on test1 when keepalived is stopped on test2, but I suggest you correct the issues identified above first, and if you are still experiencing the original problem you will need to post the full keepalived logs from both systems. Also, if you execute kill -USR1 $(cat /var/run/keepalived.pid) when test1 has not taken over as master keepalived will produce a file /tmp/keepalived.data, and it would be helpful if you posted that as well.

adnanhamdussalam commented 2 days ago

I have changed the setting now :

Master:

[postgres@testbed06 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 # Check every 2 seconds weight -10 # Reduce priority by 10 if the script fails }

vrrp_instance VI_1 { state MASTER # Set this node as MASTER interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must be the same on both nodes) priority 101 # Priority (higher number means higher priority) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match on both nodes) } virtual_ipaddress { 10.114.16.72 # Virtual IP address (VIP) } track_script { chk_haproxy }

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt

}

BAckup:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }

vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 # Check every 2 seconds weight -2 # Reduce priority by 10 if the script fails }

vrrp_instance VI_1 { state BACKUP # Set this node as BACKUP interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must match the MASTER node) priority 100 # Priority (lower than MASTER) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match the MASTER) } virtual_ipaddress { 10.114.16.72 # Same VIP as the MASTER node } track_script { chk_haproxy }

notify_master /etc/keepalived/start_haproxy.sh notify_backup /etc/keepalived/stop_haproxy.sh }

when I shutdown the haproxy on testbed09 the service do not get to testbed06 because priority is 91 on it How can I control this issue ?

PFb the logs of both and I unable to find the /tmp/keepalived

master log output :

[postgres@testbed09-1664 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Tue 2024-10-29 12:31:42 EDT; 2min 3s ago Main PID: 2761615 (keepalived) Tasks: 2 (limit: 201936) Memory: 1.9M CPU: 1.006s CGroup: /system.slice/keepalived.service ├─2761615 /usr/sbin/keepalived --dont-fork -D └─2761616 /usr/sbin/keepalived --dont-fork -D

Oct 29 12:32:32 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 98 to 100 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: Script chk_haproxy now returning 1 Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: VRRP_Script(chk_haproxy) failed (exited with status 1) Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 100 to 98

backup log :

[postgres@testbed06 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Mon 2024-10-28 06:37:58 EDT; 1 day 5h ago Main PID: 352793 (keepalived) Tasks: 2 (limit: 98870) Memory: 2.0M CPU: 8min 26.411s CGroup: /system.slice/keepalived.service ├─352793 /usr/sbin/keepalived --dont-fork -D └─352794 /usr/sbin/keepalived --dont-fork -D

Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: Script chk_haproxy now returning 1 Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: VRRP_Script(chk_haproxy) failed (exited with status 1) Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: (VI_1) Changing effective priority from 101 to 91 Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Master received advert from 10.114.16.64 with higher priority 98, ours 91 Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Entering BACKUP STATE Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) removing VIPs.

pqarmitage commented 2 days ago

The file produced by kill -USR1 ... is /tmp/keepalived.data, not /tmp/keepalived.

I think your problem is that when a VRRP instance is in the backup state, you stop ha_proxy, and only start it again once the VRRP instance transitions to master state.

So when keepalived is running on both testbed06 and testbed09, the VRRP instances start in backup mode (and haproxy is not running), so testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2). testbed06 has higher priority and so becomes VRRP master, haproxy is started, and the VRRP instance priority increases to 101.

You then stop haproxy on testbed06, and so the VRRP priority reduces to 99, but this is still higher than testbed09, and so testbed06 remains the VRRP master.

I think you need to not have keepalived starting and stopping haproxy, and then it should work.

adnanhamdussalam commented 1 day ago

Thank you for the update I have removed the starting and stopping haproxy in keepalived but still having the same issue because when priority get low on backup keepalived as it is testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2) after stopping the service on testbed06 the priority becomes 99 and on testbed09 has already priority 98 so the VIP do not switch.

I think failover (HA) for haproxy is not possible with keepalived as per my best knowledge and after performing many tests.

Kindly can you share your expert opinion on the above test case.

pqarmitage commented 1 day ago

Based on the configuration you have provided above, your statement that testbed06 has priority 99 and testbed09 has priority 98 means that haproxy is not running on either system. You need to ensure that haproxy is permanently running on both systems, i.e. enable the haproxy service using systemctl.

There appear to be quite a few websites that describe how to use keepalived with haproxy, such as: https://medium.com/@kemalozz/installation-of-haproxy-and-keepalived-for-high-availability-f1d6e7b8982a https://sysadmins.co.za/achieving-high-availability-with-haproxy-and-keepalived-building-a-redundant-load-balancer/ https://www.digitalocean.com/community/tutorials/how-to-set-up-highly-available-haproxy-servers-with-keepalived-and-reserved-ips-on-ubuntu-14-04 https://docs.vmware.com/en/vRealize-Operations/8.10/vrops-manager-load-balancing/GUID-EC001888-776B-42D5-9843-719EF08AB940.html https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/load_balancer_administration/s2-lvs-keepalived-haproxy-vsa

and these should give you some guidance on what you need to do.

adnanhamdussalam commented 1 day ago

Thank you for the update. It is not possible to running haproxy on both servers because the haproxy is using VIP and haproxy service will up from the server where the VIP resides. When I try to start the service of haproxy on the server where VIP does not reside the haproxy service simple error out and exited error the service.

pqarmitage commented 1 day ago

In the sample configurations I have seen don't specify an IP address to bind to, e.g.:

frontend my_frontend
  bind *:80
  default_backend my_backend

Doing it that way, you should not have a problem with the VIP not being present.

adnanhamdussalam commented 16 hours ago

Thank you for the update. After changing the bind to * now rather VIP is available or not psql using IP connect to postgresql.

I still think HA is not possible for haproxy using keepalived.

Any idea ?

PFB the output :

Backup testbed09:

Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Master received advert from 10.114.16.50 with higher priority 102, ours 101 Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Entering BACKUP STATE Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) removing VIPs.

[postgres@testbed09-1664 keepalived]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

mydb=#

Master testbed06:

Oct 31 06:57:16 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 [postgres@testbed06 haproxy]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.

mydb=#

pqarmitage commented 14 hours ago

I think you probably need to post your haproxy configuration in order for us to be able to comment any further. I can then try and reproduce the problem.

It is worth trying to access postgresql from a third machine, and not testbed06 or testbed09, in the first instance. There can be complications when trying to forward connections when they originate on the same system as is doing the forwarding, although I don't know if that applies to haproxy. If you test it this way, then you can use tcpdump or wireshark to see what is happening to the packets and identify where the problem lies. You might find it works better if you use

Given the number of artlcles on the web that describe how to use haproxy and keepalived together, I think it is very unlikely that HA is not possible for haproxy using keepalived.

pqarmitage commented 13 hours ago

I've just seen that I didn't finish the second paragraph. I intended to suggest that you add use_vmac to the vrrp_instance block. That would have the advantage that the MAC address associated with the VIP does not change when the backup takes over as master. However, it does mean that the backup instance would not be able to communicate with the VIP on the other system, since the advertised MAC address for the VIP would be locally configured on the backup.