Open adnanhamdussalam opened 1 week ago
You will need to provide copies of your keepalived configurations, and also any track_scripts that you are using. Then we can have a look at it.
PFB the configuration settings:
server 1:
[root@testbed06 postgres]# cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }
vrrp_script chk_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -20 # Apply this weight if HAProxy is down on this node fall 2 rise 2
}
vrrp_script chk_both_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -50 # Apply this weight if both nodes' HAProxy services are down fall 2 rise 2
}
vrrp_instance VI_1 { state MASTER # Set this node as MASTER interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must be the same on both nodes) priority 101 # Priority (higher number means higher priority) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match on both nodes) } virtual_ipaddress { 10.114.16.72 # Virtual IP address (VIP) } track_script { chk_haproxy_down chk_both_haproxy_down }
notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt
} [root@testbed06 postgres]# cat "/etc/keepalived/chk_haproxy_advanced.sh"
KEEPALIVED_VRRP_INSTANCE="VI_1"
if killall -0 haproxy >/dev/null 2>&1; then
echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100 # Example: Set priority back to full
exit 0
else
ssh postgres@10.114.16.64 "killall -0 haproxy >/dev/null 2>&1"
sleep 10
if [ $? -ne 0 ]; then
# Both HAProxy services are down, reduce priority drastically
echo "Both HAProxy services are down. Reducing priority drastically."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50 # Example: Reduce priority significantly
exit 2
else
# Only this node's HAProxy is down, reduce priority moderately
echo "HAProxy is down on this node. Reducing priority moderately."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80 # Example: Reduce priority moderately
exit 1
fi
fi
server 2:
[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }
vrrp_script chk_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -20 # Apply this weight if HAProxy is down on this node fall 2 rise 2
}
vrrp_script chk_both_haproxy_down { script "/etc/keepalived/chk_haproxy_advanced.sh" interval 2 weight -50 # Apply this weight if both nodes' HAProxy services are down fall 2 rise 2
}
vrrp_instance VI_1 { state BACKUP # Set this node as BACKUP interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must match the MASTER node) priority 100 # Priority (lower than MASTER) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match the MASTER) } virtual_ipaddress { 10.114.16.72 # Same VIP as the MASTER node } track_script { chk_haproxy_down
}
notify_master /etc/keepalived/start_haproxy.sh notify_backup /etc/keepalived/stop_haproxy.sh preempt } [postgres@testbed09-1664 ~]$ cat /etc/keepalived/chk_haproxy_advanced.sh
KEEPALIVED_VRRP_INSTANCE="VI_1"
if killall -0 haproxy >/dev/null 2>&1; then
echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100 # Example: Set priority back to full
exit 0
else
ssh postgres@10.114.16.50 "killall -0 haproxy >/dev/null 2>&1"
sleep 10 if [ $? -ne 0 ]; then
echo "Both HAProxy services are down. Reducing priority drastically."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50 # Example: Reduce priority significantly
exit 2
else
# Only this node's HAProxy is down, reduce priority moderately
echo "HAProxy is down on this node. Reducing priority moderately."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80 # Example: Reduce priority moderately
exit 1
fi
fi
There appear to be a number of issues:
sleep 10
after the ssh postgres@ ...
command in the chk_haproxy_advanced.sh script. The exit code of sleep
will be 0, and so the else
block will always be executed, and the result of ssh postgres@...
will always be ignored.ip vrf VI_1 100
(or 80 or 50) are expected to do, unless you have commands 100 80 and 50. Have you created a vrf named VI_1?I don't know why keepalived is not taking over as master on test1 when keepalived is stopped on test2, but I suggest you correct the issues identified above first, and if you are still experiencing the original problem you will need to post the full keepalived logs from both systems. Also, if you execute kill -USR1 $(cat /var/run/keepalived.pid)
when test1 has not taken over as master keepalived will produce a file /tmp/keepalived.data, and it would be helpful if you posted that as well.
I have changed the setting now :
Master:
[postgres@testbed06 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }
vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 # Check every 2 seconds weight -10 # Reduce priority by 10 if the script fails }
vrrp_instance VI_1 { state MASTER # Set this node as MASTER interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must be the same on both nodes) priority 101 # Priority (higher number means higher priority) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match on both nodes) } virtual_ipaddress { 10.114.16.72 # Virtual IP address (VIP) } track_script { chk_haproxy }
notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt
}
BAckup:
[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf global_defs { script_user root enable_script_security }
vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 # Check every 2 seconds weight -2 # Reduce priority by 10 if the script fails }
vrrp_instance VI_1 { state BACKUP # Set this node as BACKUP interface enp1s0 # Network interface to monitor virtual_router_id 51 # VRRP ID (must match the MASTER node) priority 100 # Priority (lower than MASTER) advert_int 1 # Advertisement interval (seconds) authentication { auth_type PASS auth_pass 1234 # Authentication password (must match the MASTER) } virtual_ipaddress { 10.114.16.72 # Same VIP as the MASTER node } track_script { chk_haproxy }
notify_master /etc/keepalived/start_haproxy.sh notify_backup /etc/keepalived/stop_haproxy.sh }
when I shutdown the haproxy on testbed09 the service do not get to testbed06 because priority is 91 on it How can I control this issue ?
PFb the logs of both and I unable to find the /tmp/keepalived
master log output :
[postgres@testbed09-1664 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Tue 2024-10-29 12:31:42 EDT; 2min 3s ago Main PID: 2761615 (keepalived) Tasks: 2 (limit: 201936) Memory: 1.9M CPU: 1.006s CGroup: /system.slice/keepalived.service ├─2761615 /usr/sbin/keepalived --dont-fork -D └─2761616 /usr/sbin/keepalived --dont-fork -D
Oct 29 12:32:32 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 98 to 100
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: Script chk_haproxy
now returning 1
Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: VRRP_Script(chk_haproxy) failed (exited with status 1)
Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 100 to 98
backup log :
[postgres@testbed06 ~]$ systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled) Active: active (running) since Mon 2024-10-28 06:37:58 EDT; 1 day 5h ago Main PID: 352793 (keepalived) Tasks: 2 (limit: 98870) Memory: 2.0M CPU: 8min 26.411s CGroup: /system.slice/keepalived.service ├─352793 /usr/sbin/keepalived --dont-fork -D └─352794 /usr/sbin/keepalived --dont-fork -D
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: Script chk_haproxy
now returning 1
Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: VRRP_Script(chk_haproxy) failed (exited with status 1)
Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: (VI_1) Changing effective priority from 101 to 91
Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Master received advert from 10.114.16.64 with higher priority 98, ours 91
Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Entering BACKUP STATE
Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) removing VIPs.
The file produced by kill -USR1 ...
is /tmp/keepalived.data, not /tmp/keepalived.
I think your problem is that when a VRRP instance is in the backup state, you stop ha_proxy, and only start it again once the VRRP instance transitions to master state.
So when keepalived is running on both testbed06 and testbed09, the VRRP instances start in backup mode (and haproxy is not running), so testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2). testbed06 has higher priority and so becomes VRRP master, haproxy is started, and the VRRP instance priority increases to 101.
You then stop haproxy on testbed06, and so the VRRP priority reduces to 99, but this is still higher than testbed09, and so testbed06 remains the VRRP master.
I think you need to not have keepalived starting and stopping haproxy, and then it should work.
Thank you for the update I have removed the starting and stopping haproxy in keepalived but still having the same issue because when priority get low on backup keepalived as it is testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2) after stopping the service on testbed06 the priority becomes 99 and on testbed09 has already priority 98 so the VIP do not switch.
I think failover (HA) for haproxy is not possible with keepalived as per my best knowledge and after performing many tests.
Kindly can you share your expert opinion on the above test case.
Based on the configuration you have provided above, your statement that testbed06 has priority 99 and testbed09 has priority 98 means that haproxy is not running on either system. You need to ensure that haproxy is permanently running on both systems, i.e. enable the haproxy service using systemctl.
There appear to be quite a few websites that describe how to use keepalived with haproxy, such as: https://medium.com/@kemalozz/installation-of-haproxy-and-keepalived-for-high-availability-f1d6e7b8982a https://sysadmins.co.za/achieving-high-availability-with-haproxy-and-keepalived-building-a-redundant-load-balancer/ https://www.digitalocean.com/community/tutorials/how-to-set-up-highly-available-haproxy-servers-with-keepalived-and-reserved-ips-on-ubuntu-14-04 https://docs.vmware.com/en/vRealize-Operations/8.10/vrops-manager-load-balancing/GUID-EC001888-776B-42D5-9843-719EF08AB940.html https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/load_balancer_administration/s2-lvs-keepalived-haproxy-vsa
and these should give you some guidance on what you need to do.
Thank you for the update. It is not possible to running haproxy on both servers because the haproxy is using VIP and haproxy service will up from the server where the VIP resides. When I try to start the service of haproxy on the server where VIP does not reside the haproxy service simple error out and exited error the service.
In the sample configurations I have seen don't specify an IP address to bind to, e.g.:
frontend my_frontend
bind *:80
default_backend my_backend
Doing it that way, you should not have a problem with the VIP not being present.
Thank you for the update. After changing the bind to * now rather VIP is available or not psql using IP connect to postgresql.
I still think HA is not possible for haproxy using keepalived.
Any idea ?
PFB the output :
Backup testbed09:
Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Master received advert from 10.114.16.50 with higher priority 102, ours 101 Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Entering BACKUP STATE Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) removing VIPs.
[postgres@testbed09-1664 keepalived]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.
mydb=#
Master testbed06:
Oct 31 06:57:16 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72 [postgres@testbed06 haproxy]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb psql (16.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help.
mydb=#
I think you probably need to post your haproxy configuration in order for us to be able to comment any further. I can then try and reproduce the problem.
It is worth trying to access postgresql from a third machine, and not testbed06 or testbed09, in the first instance. There can be complications when trying to forward connections when they originate on the same system as is doing the forwarding, although I don't know if that applies to haproxy. If you test it this way, then you can use tcpdump or wireshark to see what is happening to the packets and identify where the problem lies. You might find it works better if you use
Given the number of artlcles on the web that describe how to use haproxy and keepalived together, I think it is very unlikely that HA is not possible for haproxy using keepalived.
I've just seen that I didn't finish the second paragraph. I intended to suggest that you add use_vmac
to the vrrp_instance
block. That would have the advantage that the MAC address associated with the VIP does not change when the backup takes over as master. However, it does mean that the backup instance would not be able to communicate with the VIP on the other system, since the advertised MAC address for the VIP would be locally configured on the backup.
Hi,
Servers: test1, test2
I have already configured keepalived against two ha proxy servers I am able to move the VIP to other server when haproxy services goes down on one server (test1). Now when the haproxy service is running on server test2 and VIP is also on server test2 and then I start the service of keepalived on test1 but its priority is low and then when I shutdown the haproxy service on server test2 due to low priority on server test1 the keepalived does not move back the VIP to server test1.
Any idea or possibility to do it?