acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4k stars 737 forks source link

VRRP packets can not advertised on dummy interface #2508

Closed hengqiali closed 2 days ago

hengqiali commented 3 days ago

Describe the bug

When configuring two servers (server_A and server_B) in a master-slave setup using keepalived and the VRRP protocol to provide high availability, an issue arose where VRRP heartbeat packets were not properly propagating between the two servers, resulting in a split-brain scenario. Specifically, both servers' em5 interfaces ended up acquiring the same virtual IP address, 38.145.72.224/32.

Network topology and basic information of the servers are as follows

Two physical servers: server_A and server_B.

server_A configuration:

# bird setup a ECMP route on 2 phy nics as default route,
# which set source ip to the public ip
default proto bird src 38.145.72.193 metric 32
        nexthop via 10.105.1.10 dev enp4s0f0 weight 1
        nexthop via 10.105.1.12 dev enp4s0f1 weight 1
10.105.1.10/31 dev enp4s0f0 proto kernel scope link src 10.105.1.11
10.105.1.12/31 dev enp4s0f1 proto kernel scope link src 10.105.1.13
# bird.conf
router id 172.18.xxx.yyy;
ipv4 table master4;

define LOCAL_AS = 4290105101;
define TOR_A_AS = 4259105001;
define TOR_B_AS = 4259205001;
define LOCAL_NIC01 = 10.105.1.11;
define LOCAL_NIC02 = 10.105.1.13;
define TOR_A_IP = 10.105.1.10;
define TOR_B_IP = 10.105.1.12;
define LOCAL_NET = [ 38.145.72.193/32 ];
function is_self_net() {
  return net ~ LOCAL_NET;
}

filter export_to_tor {
    if net ~ LOCAL_NET then accept;
    else reject;
}

# The direct protocol automatically generates device routes to
# all network interfaces. Can exist in as many instances as you wish
# if you want to populate multiple routing tables with device routes.
protocol direct {
    ipv4;
    interface "em5","em6","em7","em8","em9","em10",-"*";    # Restrict network interfaces it works with
}

protocol bfd {
    # export filter export_to_tor;
    interface "*" {
        interval 333 ms;
  };
}

# This pseudo-protocol performs synchronization between BIRD's routing
# tables and the kernel. If your kernel supports multiple routing tables
# (as Linux 2.2.x does), you can run multiple instances of the kernel
# protocol and synchronize different kernel tables with different BIRD tables.
protocol kernel {
    scan time 1;
    merge paths yes limit 4;
    ipv4 {
      import none;
      export filter {
        if proto = "direct1" then reject;
        krt_prefsrc = 38.145.72.193;
        accept;
      };
  };
}

protocol device {
    scan time 1;        # Scan interfaces every 1 seconds
}

protocol bgp bgp_A {
    bfd;
    description "TOR A";
    local as LOCAL_AS ;
    neighbor TOR_A_IP as TOR_A_AS; # neighbor addres and AS
    source address LOCAL_NIC01;    # What local address we use for the TCP connecti
       import all;
       export filter export_to_tor;
       next hop self;           # Disable next hop processing and always advertise local address as nexthop
     };
}

protocol bgp bgp_B {
    bfd;
    description "TOR B";
    local as LOCAL_AS ;
    neighbor TOR_B_IP as TOR_B_AS; # neighbor addres and AS
    source address LOCAL_NIC02;    # What local address we use for the TCP connecti
    default bgp_med 0;          # MED value we use for comparison when none is defined
    default bgp_local_pref 0;   # The same for local preference
    path metric 1;              # Prefer routes with shorter paths (like Cisco does)
    ipv4 {
       import all;
       export filter export_to_tor;
       next hop self;           # Disable next hop processing and always advertise local address as nexthop
     };
}

server_B configuration:

The topology is identical to server_A, with the only difference being the specific IP addresses:

Expected behavior

Server_A and server_B are able to successfully exchange VRRP heartbeat packets with each other, sharing a virtual IP address (38.145.72.224). For example, in the VRRP packets advertised by server_A, the source MAC address corresponds to the MAC address of server_A’s enp4s0f0/enp4s0f1 interfaces, and the destination MAC address is the MAC address of the corresponding TOR_A or TOR_B interfaces.

Current abnormal behavior

Currently, server_A and server_B are unable to properly propagate VRRP heartbeat packets between each other, resulting in a split-brain scenario. Specifically, both server_A and server_B's em5 interfaces have acquired the virtual IP (VIP) 38.145.72.224/32. The following is keepalived's configuration:

# server_A's keepalived configuration
# cat /etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
    state MASTER
    interface em5
    virtual_router_id 51
    priority 100
    advert_int 1

    unicast_peer {
        38.145.72.195
    }

    authentication {
        auth_type PASS
        auth_pass 1234
    }

    virtual_ipaddress {
       38.145.72.224 dev em5
    }
}
# server_B's keepalived configuration
# cat /etc/keepalived/keepalived.conf
vrrp_instance VI_2 {
    state BACKUP
    interface em5
    virtual_router_id 51
    priority 90
    advert_int 1

    unicast_peer {
        38.145.72.193
    }

    authentication {
        auth_type PASS
        auth_pass 1234
    }

    virtual_ipaddress {
       38.145.72.224 dev em5
    }
}

Using the tcpdump tool on server_A, I captured the following VRRP packets and observed that the source MAC address and destination MAC address in the VRRP packets both use the em5 MAC address. This is definitely incorrect! However, I’m unsure whether this issue is caused by a misconfiguration or some other underlying reason...

# ip a show em5
8: em5: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 7e:d8:16:80:f9:60 brd ff:ff:ff:ff:ff:ff
    inet 38.145.72.193/32 scope global em5
       valid_lft forever preferred_lft forever
    inet 38.145.72.224/32 scope global em5
       valid_lft forever preferred_lft forever
    inet6 fe80::7cd8:16ff:fe80:f960/64 scope link
       valid_lft forever preferred_lft forever
# tcpdump -i em5 -nn -vvv -e vrrp
tcpdump: listening on em5, link-type EN10MB (Ethernet), snapshot length 262144 bytes
05:33:33.653426 7e:d8:16:80:f9:60 > 7e:d8:16:80:f9:60, ethertype IPv4 (0x0800), length 54: (tos 0xc0, ttl 255, id 65, offset 0, flags [none], proto VRRP (112), length 40)
    38.145.72.193 > 38.145.72.195: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20, addrs: 38.145.72.224 auth "1234"
05:33:34.653524 7e:d8:16:80:f9:60 > 7e:d8:16:80:f9:60, ethertype IPv4 (0x0800), length 54: (tos 0xc0, ttl 255, id 66, offset 0, flags [none], proto VRRP (112), length 40)
    38.145.72.193 > 38.145.72.195: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20, addrs: 38.145.72.224 auth "1234"
05:33:35.653646 7e:d8:16:80:f9:60 > 7e:d8:16:80:f9:60, ethertype IPv4 (0x0800), length 54: (tos 0xc0, ttl 255, id 67, offset 0, flags [none], proto VRRP (112), length 40)
    38.145.72.193 > 38.145.72.195: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20, addrs: 38.145.72.224 auth "1234"

Keepalived version v2.2.4

Did keepalived coredump? No.

pqarmitage commented 3 days ago

@hengqiali I assume it is an editing mistake but the MAC addresses you have given for the four interfaces enp4s0f0, enp4s0f1 on server_A and enp4s0f0 and enp4s0f1 on server_B are the same, and also have only 5 octets.

The lowest level address that keepalived specifies when sending a VRRP packet is the IP address, either unicast or multicast. It is the kernel that adds the MAC addresses to packets. It would be interesting to see the output of ip neigh show on server_A, since that may explain why the destination MAC address is being set as it is.

You state that the VIP is 38.145.72.224/32. If it really is a /32 subnet, i.e. no other device exists on that subnet, how are the routers supposed to discover the MAC address to use to forward packets to the VIP? More specifically, what subnet do you have configured that server_A, server_B and your TOR routers are all in that will allow packets to be forwarded to 38.145.72.224?

hengqiali commented 3 days ago

@hengqiali I assume it is an editing mistake but the MAC addresses you have given for the four interfaces enp4s0f0, enp4s0f1 on server_A and enp4s0f0 and enp4s0f1 on server_B are the same, and also have only 5 octets.

You're right! It is an editing mistake, sorry for the noise! I have corrected it.

The lowest level address that keepalived specifies when sending a VRRP packet is the IP address, either unicast or multicast. It is the kernel that adds the MAC addresses to packets.

This is a very important information, thanks for highlighting this!

It would be interesting to see the output of ip neigh show on server_A, since that may explain why the destination MAC address is being set as it is.

server_A: $ ip neigh show ( I have filtered some unrelated entries ) 10.105.1.10 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:05 REACHABLE 10.105.1.12 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e5 REACHABLE fe80::429e:a4ff:fe8b:2c05 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:05 router STALE fe80::e6f2:7cff:fe1f:2fe5 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e5 router STALE

Seems that there is no cached entries for 38.145.72.195/32?

server_B: $ ip neigh show ( I have filtered some unrelated entries ) 10.105.1.30 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:07 REACHABLE 10.105.1.32 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e7 REACHABLE fe80::429e:a4ff:fe8b:2c07 dev enp4s0f0 lladdr 40:9e:a4:8b:2c:07 router STALE fe80::e6f2:7cff:fe1f:2fe7 dev enp4s0f1 lladdr e4:f2:7c:1f:2f:e7 router STALE

You state that the VIP is 38.145.72.224/32. If it really is a /32 subnet, i.e. no other device exists on that subnet, how are the routers supposed to discover the MAC address to use to forward packets to the VIP? More specifically, what subnet do you have configured that server_A, server_B and your TOR routers are all in that will allow packets to be forwarded to 38.145.72.224?

This VIP is just a fake ip for now, what confused me is that why VRRP cannot migrate this VIP between em5 interfaces. If this issue I post can be solved, then we will obtain the VIP from the reserved valid network segment and announce it in the network via BGP, which is similar to em5's private ip announced as a public ip via BGP.

pqarmitage commented 2 days ago

Seems that there is no cached entries for 38.145.72.195/32?

This is not a keepalived matter and is something that you will need to resolve in respect of your network configuration. I am therefore now closing this issue.

hengqiali commented 2 days ago

Seems that there is no cached entries for 38.145.72.195/32?

This is not a keepalived matter and is something that you will need to resolve in respect of your network configuration. I am therefore now closing this issue.

Ok, thanks for your reply. One more thing I want to confirm is that, does vrrp packets generated from keepalived will go through the whole kernel network stack and check the kernel routings? Thank you in advance.

pqarmitage commented 2 days ago

Yes, the vrrp packets go through the full IP kernel stack, and so will follow the kernel routing. We also, in some circumstances, use nftables/iptables to control packets, so it is fully using that part of the stack too. If the VRRP packets are being sent via multicast (which is not the case in your configuration), then the packets are marked as DONTROUTE, in order to comply with the VRRP RFC.