Closed Zarkaouette closed 2 years ago
The logs from your backup indicate that you have not provided your complete configuration, since the entries refer to line numbers that don't exist in the configuration you have provided. If you are using ip rules, then you are using policy routing, and that could well relate to the cause of your problem. Indeed, the log messages state that there are missing priorities in the ip rule statements, and so keepalived has used a default priority, and indicated that that may not be what you want.
You are using VMACs and also unicast peers; this doesn't really make sense. If you use VMACs when you should just let keepalived multicast. Also I would remove the vmac_xmit_base
.
If you can reach the VIP from the host on which it is configured but not from other hosts on the same LAN, then that suggests it is probably a firewall issue, either on the host where you are running keepalived or on the hosts from which you are trying to connect. When I have these problems I tend to use tcpdump or wireshark ti identify where the packets are being blocked. I also find it can be useful to add firewall rules without any action, e.g.
iptables -I INPUT -s YYY.YYY.YYY.YYY -d XXX.XXX.XXX.10
and then you can inspect the counters to see if matching packets are traversing the rule.
First of all I would try removing nginx from the setup and just try pinging the VIP. If that still doesn't work then I suggest you try without running keepalived and manually setting up the ip rules/routes/addresses and get that working. Once that is working then update the keepalived configuration to make sure that it is setting up the same ip rules/routes/addresses as the manual configuration you created.
If you post your manually created ip rules/routes/addresses that work, we can assist in translating those into the keepalived configuration.
Alternatively you could post the output of:
ip rule
ip route
For each table listed in the ip rule
output:
ip route list table nnn
And finally:
ip address
If you obfuscate any addresses etc, you will need to change them to valid address (e.g. 10.1.2.3) maintaining the appropriate subnet structure so that we can understand what is happening.
You should not need to add any iptables entries, or change any sysctl values, unless your existing iptables configuration is stopping what you are doing from working, or your existing sysctl values have been changed from defaults.
I am closing this issue since it is not a keepalived issue but rather a network configuration issue, but you will still be able to update the issue and we can respond further.
Hello @pqarmitage !
First of all I would like to thank you a lot for your prompt and very helpful answer ! It's a real relieve to have you !
Then, I admit that my knowledge on network debug & most importantly on keepalived is very limited. Let me answer you point by point. Also, for security reasons I will use a different network than the one actually in use but keep things the same.
1°) Sorry, looks like I added some extra config in the BACKUP node. Now they are the same, and the BACKUP starts with this output :
Jun 23 01:53:14 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:15 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:16 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:17 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:18 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:19 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:20 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:21 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) WARNING - equal priority advert received from remote host with our IP address.
Jun 23 01:53:22 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) Master received advert from 100.98.227.4 with higher priority 151, ours 150
Jun 23 01:53:22 vm-express-route-prod-2 Keepalived_vrrp[885]: (VI_01) Entering BACKUP STATE
2°) Ok, deleted vmac_xmit_base
from the conf. Also I deleted the unicast setting.
3°) About that :
If that still doesn't work then I suggest you try without running keepalived and manually setting up the ip rules/routes/addresses and get that working
I'm not sure about what you mean. My 2 VMs can communicate (ping/curl on whatever port) perfectly well as they are in the same network.
VM A (MASTER) has IP 100.98.227.4 VM B (BACKUP) has IP 100.98.227.5
On VM A (MASTER), I run the following :
tcpdump -i eth0 host 100.98.227.5
On VM B (BACKUP) I run a simple ping command on the VIP address
I get the following result :
01:58:34.782022 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:35.782277 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:36.782426 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:37.782600 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:38.782732 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:39.783006 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:40.783153 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:41.783325 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:41.997319 ARP, Request who-has 100.98.227.5 tell vm-express-route-prod-1, length 28
01:58:41.998136 ARP, Reply 100.98.227.5 is-at 12:34:56:78:9a:bc (oui Unknown), length 28
01:58:42.783564 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
01:58:43.783751 IP vm-express-route-prod-1 > 100.98.227.5: VRRPv2, Advertisement, vrid 151, prio 151, authtype none, intvl 1s, length 20
4°) On both VM, I have only 2 rules :
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
For both of them, the definition of these rules are as such : Local rule for MASTER (it's the same for BACKUP, except for the src IP) :
broadcast 100.98.227.0 dev eth0 proto kernel scope link src 100.98.227.4
local 100.98.227.4 dev eth0 proto kernel scope host src 100.98.227.4
local 100.98.227.10 dev eth0 proto kernel scope host src 100.98.227.4
broadcast 100.98.227.255 dev eth0 proto kernel scope link src 100.98.227.4
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
Main rule for MASTER (again, similar with BACKUP) :
default via 100.98.227.1 dev eth0 proto dhcp src 100.98.227.4 metric 100
100.98.227.0/24 dev eth0 proto kernel scope link src 100.98.227.4
168.63.129.16 via 100.98.227.1 dev eth0 proto dhcp src 100.98.227.4 metric 100
169.254.169.254 via 100.98.227.1 dev eth0 proto dhcp src 100.98.227.4 metric 100
ip route list table default command returns me :
Error: ipv4: FIB table does not exist.
Dump terminated
I really hope this can help because I have no idea what is going on here.... Thanks a lot in advance for your kindness, help and patience :) Have a great day
Hello @pqarmitage
Just to let you know, this issue seems to be related to Azure itself, as they block all ARP request (for some security reasons). As it is PROD and need reliability anyway we went for the Azure LoadBalancers instead.
Have a great day and thanks a lot anyway for your very valuable help :)
Describe what you need help/support for I'm using VMs in Azure and running reverse-proxy (nginx) on 2 instances. As it is PROD, I need this RP functionnality to be highly-available and then decided to go with Keepalived. After installing/configuring in a MASTER/BACKUP fashion, I'm able to reach the VIP only locally to the MASTER node. If I manually failover the BACKUP node as a MASTER, then again I can reach the service on the VIP locally to the node but not from any other machine located in the same LAN.
Details of what you would like to do with keepalived When I start the keepalived process and have the VIP attributed to a node, it should be accessible from any other VM/device within the same network
Keepalived version v2.0.19 (10/19,2019)
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS) Both NGINX (RP) and KEEPALIVED are running directly on the host, no containerisation is used, no hosted service.
Configuration file: MASTER Configuration :
The BACKUP node has the exact same configuration except for the "state" and "priority"
Notify and track scripts Content of the "check_nginx":
Logs on the MASTER
Logs on the BACKUP
So far it's ok.
From MASTER :
From another machine in the same LAN (here from BACKUP) :
As already described, if I stop NGINX on the MASTER, then BACKUP node becomes MASTER. Then, I can observe the same thing : I'm able to curl from the new MASTER on the VIP but not from any other nodes.
I tried playing around IPTABLES and many other options (net.ipv4.conf.all.arp_ignore, net.ipv4.conf.all.arp_annouce, virtual_routes etc...) but nothing seems to work. Any of you guys have any ideas ?