acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
3.96k stars 737 forks source link

Keepalived GARP wrong MAC address from the bind #2298

Closed geotransformer closed 1 year ago

geotransformer commented 1 year ago

Describe the bug Keepalived is broadcasting the wrong MAC of the bond (after bond failover)

The investigation showed the keepalived could miss some netlink event, i.e., bond failover and MAC address update

To Reproduce

1> Might reduce the bufs global_defs { vrrp_netlink_monitor_rcv_bufs 1 } 2> Bond interface failover

node-1 ~$ cat /proc/net/bonding/bd1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: enp94s0f0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enp216s0f1                     
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 2  
Permanent HW addr: 3c:fd:fe:df:ad:c9                       
Slave queue ID: 0

Slave Interface: enp94s0f0                     
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 3c:fd:fe:df:ae:78
Slave queue ID: 0

Expected behavior A clear and concise description of what you expected to happen.

Keepalived version

1.3.9 (system service)
2.0.20 (keepalived in k8s pod)

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) Found the issue in container keepalived AND hosted keepalived service

Configuration file:

global_defs {
    vrrp_netlink_monitor_rcv_bufs 64000
}
vrrp_script check_apiserver {
    script "/etc/keepalived/check_script.sh"
    interval 1        #check every second
    weight -55         #default prio: -2 if connect fails
    rise 3            # required number of successes for OK transition
    fall 2            # required number of successes for KO transition
    init_fail         # assume script initially is in failed state
}
#vrrp instance config
vrrp_instance VI_1 {
    state MASTER
    interface vlan100
    virtual_router_id 102
    priority 100
    advert_int 1
    garp_master_refresh 30
    garp_master_refresh_repeat 1
    authentication {
        auth_type PASS
        auth_pass xxxxxxxx
    }

        unicast_src_ip 10.192.1.12
        unicast_peer {
                            10.192.1.13
                            10.192.1.14
                    }

    virtual_ipaddress {
        10.192.1.11/24 dev vlan100
    }

    track_script {
       check_script
    }
}

   bd1:
            dhcp4: false
            dhcp6: false
            interfaces:
            - enp216s0f1
            - enp94s0f0
            optional: true
            parameters:
                fail-over-mac-policy: active
                mii-monitor-interval: '100'
                mode: active-backup
    vlans:
        vlan100:
            addresses:
            - 10.192.1.12/24
            dhcp4: false
            dhcp6: false
            id: 107
            link: bd1

Notify and track scripts

If any notify or track scripts are in use, please provide copies of them

System Log entries

Full keepalived system log entries from when keepalived started

Did keepalived coredump?

If so, can you please provide a stacktrace from the coredump, using gdb.

Additional context Add any other context about the problem here.

nser77 commented 1 year ago

Hi, the problem could be connected to netlink and MII.

From you configuration:

[...] MII Status: up MII Polling Interval (ms): 100 [...]

As per default behavior, keepalived use netlink as event status updater; in your case, seems you need to relay over MII for those interfaces.

To change this beahavior, take a look to this section (man keepalived.conf):

[...]

Linkbeat interfaces
       The linkbeat_interfaces block allows specifying which interfaces should
       use  polling  via  MII,  Ethtool  or  ioctl  status rather than rely on
       netlink status updates. This allows more  granular  control  of  global
       definition linkbeat_use_polling.

       This    option    is    preferred    over   the   deprecated   use   of
       linkbeat_use_polling in a vrrp_instance  block,  since
       the  latter only allows using linkbeat on the interface of the vrrp_in-
       stance  itself,  whereas  track_interface  and   vir-
       tual_ipaddresses  and virtual_iproutes may require monitoring other in-
       terfaces, which may need to use linkbeat polling.

       The default polling type to use is MII, unless that isn't supported  in
       which  case  ETHTOOL  is  used,  and if that isn't supported then ioctl
       polling. The preferred type of polling to use can be specified with MII
       or  ETHTOOL  or  IOCTL after the interface name, but if that type isn't
       supported, a supported type will be used.

       The syntax for linkbeat_interfaces is:
           linkbeat_interfaces {
               eth2
               enp2s0 ETHTOOL
           }

[...]

Finally, other users could ask you to send more logs, like:

  1. keepalived logs.
  2. The output of keepalived -D -d.
  3. tcpdump log of the wrong beahavior.
geotransformer commented 1 year ago

MII we have - MII Polling Interval (ms): 100


~$ sudo cat /proc/net/bonding/bd1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: enp216s0f1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enp216s0f1
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 40:a6:b7:48:5a:71
Slave queue ID: 0

Slave Interface: enp94s0f0
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 40:a6:b7:4d:da:c8
Slave queue ID: 0
pqarmitage commented 1 year ago

Notes for setting up test environment:

ip link add bd1 type bond mode active-backup miimon 100 fail_over_mac active
for i in 1 2; do for c in down "master bd1" up; do ip link set enp0s20f0u$i $c; done; done
ip link set bd1 up
ip link add link bd1 vlan100 type vlan id 100
ip link set vlan100 up
ip addr add 10.192.1.13/24 brd + dev vlan100

I have tested this using a 6.2 kernel, and also on Centos 7, since you appear to be using RHEL 7/Centos 7, and I cannot reproduce the problem, unless the netlink receive buffers are set to only 1. When the current active slave goes down, keepalived updates the MAC addresses for both the bd1 and vlan100 interfaces, and the GARP messages that are subsequently sent are sent with the new MAC address of the bd1 interface.

If I set

global_defs {
vrrp_netlink_monitor_rcv_bufs 1
}

then I get log messages:

Netlink: Receive buffer overrun on monitor socket - (No buffer space available)
   - increase the relevant netlink_rcv_bufs global parameter and/or set force

although when I tried it the MAC address did get updated for GARP messages.

This is not a bug in keepalived but rather a configuration error, since insufficient netlink receive buffers are being allocated. Increasing the netlink receive buffers stops the problem occurring.

I tested the above with keepalived version 2.2.7. Version 1.3.9 is extremely old and version 2.0.20 is quite old. If the problem persists I suggest you update keepalived to v2.2.7 as a first step to resolving the issue.