Closed hengqiali closed 2 days ago
@hengqiali I assume it is an editing mistake but the MAC addresses you have given for the four interfaces enp4s0f0, enp4s0f1 on server_A and enp4s0f0 and enp4s0f1 on server_B are the same, and also have only 5 octets.
The lowest level address that keepalived specifies when sending a VRRP packet is the IP address, either unicast or multicast. It is the kernel that adds the MAC addresses to packets. It would be interesting to see the output of ip neigh show
on server_A, since that may explain why the destination MAC address is being set as it is.
You state that the VIP is 38.145.72.224/32. If it really is a /32 subnet, i.e. no other device exists on that subnet, how are the routers supposed to discover the MAC address to use to forward packets to the VIP? More specifically, what subnet do you have configured that server_A, server_B and your TOR routers are all in that will allow packets to be forwarded to 38.145.72.224?
@hengqiali I assume it is an editing mistake but the MAC addresses you have given for the four interfaces enp4s0f0, enp4s0f1 on server_A and enp4s0f0 and enp4s0f1 on server_B are the same, and also have only 5 octets.
You're right! It is an editing mistake, sorry for the noise! I have corrected it.
The lowest level address that keepalived specifies when sending a VRRP packet is the IP address, either unicast or multicast. It is the kernel that adds the MAC addresses to packets.
This is a very important information, thanks for highlighting this!
It would be interesting to see the output of
ip neigh show
on server_A, since that may explain why the destination MAC address is being set as it is.
server_A: $ ip neigh show ( I have filtered some unrelated entries ) 10.105.1.10 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:05 REACHABLE 10.105.1.12 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e5 REACHABLE fe80::429e:a4ff:fe8b:2c05 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:05 router STALE fe80::e6f2:7cff:fe1f:2fe5 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e5 router STALE
Seems that there is no cached entries for 38.145.72.195/32?
server_B: $ ip neigh show ( I have filtered some unrelated entries ) 10.105.1.30 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:07 REACHABLE 10.105.1.32 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e7 REACHABLE fe80::429e:a4ff:fe8b:2c07 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:07 router STALE fe80::e6f2:7cff:fe1f:2fe7 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e7 router STALE
You state that the VIP is 38.145.72.224/32. If it really is a /32 subnet, i.e. no other device exists on that subnet, how are the routers supposed to discover the MAC address to use to forward packets to the VIP? More specifically, what subnet do you have configured that server_A, server_B and your TOR routers are all in that will allow packets to be forwarded to 38.145.72.224?
This VIP is just a fake ip for now, what confused me is that why VRRP cannot migrate this VIP between em5 interfaces. If this issue I post can be solved, then we will obtain the VIP from the reserved valid network segment and announce it in the network via BGP, which is similar to em5's private ip announced as a public ip via BGP.
Seems that there is no cached entries for 38.145.72.195/32?
This is not a keepalived matter and is something that you will need to resolve in respect of your network configuration. I am therefore now closing this issue.
Seems that there is no cached entries for 38.145.72.195/32?
This is not a keepalived matter and is something that you will need to resolve in respect of your network configuration. I am therefore now closing this issue.
Ok, thanks for your reply. One more thing I want to confirm is that, does vrrp packets generated from keepalived will go through the whole kernel network stack and check the kernel routings? Thank you in advance.
Yes, the vrrp packets go through the full IP kernel stack, and so will follow the kernel routing. We also, in some circumstances, use nftables/iptables to control packets, so it is fully using that part of the stack too. If the VRRP packets are being sent via multicast (which is not the case in your configuration), then the packets are marked as DONTROUTE, in order to comply with the VRRP RFC.
Describe the bug
When configuring two servers (server_A and server_B) in a master-slave setup using keepalived and the VRRP protocol to provide high availability, an issue arose where VRRP heartbeat packets were not properly propagating between the two servers, resulting in a split-brain scenario. Specifically, both servers' em5 interfaces ended up acquiring the same virtual IP address, 38.145.72.224/32.
Network topology and basic information of the servers are as follows
Two physical servers: server_A and server_B.
server_A configuration:
server_B configuration:
The topology is identical to server_A, with the only difference being the specific IP addresses:
Expected behavior
Server_A and server_B are able to successfully exchange VRRP heartbeat packets with each other, sharing a virtual IP address (38.145.72.224). For example, in the VRRP packets advertised by server_A, the source MAC address corresponds to the MAC address of server_A’s enp4s0f0/enp4s0f1 interfaces, and the destination MAC address is the MAC address of the corresponding TOR_A or TOR_B interfaces.
Current abnormal behavior
Currently, server_A and server_B are unable to properly propagate VRRP heartbeat packets between each other, resulting in a split-brain scenario. Specifically, both server_A and server_B's em5 interfaces have acquired the virtual IP (VIP) 38.145.72.224/32. The following is keepalived's configuration:
Using the tcpdump tool on server_A, I captured the following VRRP packets and observed that the source MAC address and destination MAC address in the VRRP packets both use the em5 MAC address. This is definitely incorrect! However, I’m unsure whether this issue is caused by a misconfiguration or some other underlying reason...
Keepalived version v2.2.4
Did keepalived coredump? No.