acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4k stars 737 forks source link

Keepalived crashing when being used over IPoIB with ipv6 config #2100

Closed itailev closed 2 years ago

itailev commented 2 years ago

Describe the bug Keepalived is crashing on a system with IPoIB networking stack whenever ipv6 entry is used in its config file. The crash happens while Sending/queueing Unsolicited Neighbour Adverts on the ipv6 address.

  1. The issue was first seen in OpenStack cloud where keepalived is used for HA virtual router and its config it automatically created by openstack components and ipoib interfaces are created in dedicated namespace. According to the logs the crash in this case happens due to "buffer overflow detected"
  2. Issue is reproduced as well on a standalone centos8 machine on which ipoib interface is configured manually. this time a core dump is created for keepalived.

When removing the ipv6 address entry from the conf file, keepalived is not crashing.

To Reproduce Create the interfaces:

ip link add link ib2 name ib2.801f type ipoib pkey 0x801f
ip addr add 169.254.195.40/18 brd 169.254.255.255 dev ib2.801f
ip addr add 169.254.0.217/24 dev ib2.801f
ip link add link ib2 name ib2.801e type ipoib pkey 0x801e
ip -6 addr add fe80::200:16ff:fe73:fe80/64 dev ib2.801e
ip addr add 10.10.10.1 dev ib2.801e

Prepare /etc/keepalived/keepalived.conf file:

global_defs {
    notification_email_from neutron@openstack.local
    router_id neutron
}
vrrp_instance VR_217 {
    state BACKUP
    interface ib2.801f
    virtual_router_id 217
    priority 50
    garp_master_delay 60
    nopreempt
    advert_int 2
    track_interface {
        ib2.801f
    }
    virtual_ipaddress {
        169.254.0.217/24 dev ib2.801f
    }
    virtual_ipaddress_excluded {
        10.10.10.10/24 dev ib2.801e no_track
        fe80::200:16ff:fe73:fe80/64 dev ib2.801e scope link no_track
    }
}

restart keepalived service.

Expected behavior keepalived should not crash.

Keepalived version

Keepalived v2.1.5 (07/13,2020)

Copyright(C) 2001-2020 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 4.18.0
Running on Linux 4.18.0-358.el8.x86_64 #1 SMP Mon Jan 10 13:11:20 UTC 2022

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) in Openstack its running in a container, however the issue is happening on bare metal server as well.

Interfaces configuration:

12: ib2.801f@ib2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP group default qlen 256
    link/infiniband 00:00:15:ec:fe:80:00:00:00:00:00:00:1c:34:da:03:00:4d:76:b6 brd 00:ff:ff:ff:ff:12:40:1b:80:1f:00:00:00:00:00:00:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65520 
    ipoib pkey 0x801f mode datagram umcast 0000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    inet 169.254.195.40/18 brd 169.254.255.255 scope global ib2.801f
       valid_lft forever preferred_lft forever
    inet 169.254.0.217/24 scope global ib2.801f
       valid_lft forever preferred_lft forever
    inet6 fe80::1e34:da03:4d:76b6/64 scope link 
       valid_lft forever preferred_lft forever
13: ib2.801e@ib2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP group default qlen 256
    link/infiniband 00:00:15:ed:fe:80:00:00:00:00:00:00:1c:34:da:03:00:4d:76:b6 brd 00:ff:ff:ff:ff:12:40:1b:80:1e:00:00:00:00:00:00:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65520 
    ipoib pkey 0x801e mode datagram umcast 0000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    inet 10.10.10.1/32 scope global ib2.801e
       valid_lft forever preferred_lft forever
    inet 10.10.10.10/24 scope global ib2.801e
       valid_lft forever preferred_lft forever
    inet6 fe80::200:16ff:fe73:fe80/64 scope link nodad 
       valid_lft forever preferred_lft forever

System Log entries Openstack:

2022-01-19T14:34:36.651672862+02:00 stderr F Wed Jan 19 12:34:36 2022: Starting VRRP child process, pid=787439
2022-01-19T14:34:36.652023666+02:00 stderr F Wed Jan 19 12:34:36 2022: Registering Kernel netlink reflector
2022-01-19T14:34:36.652052355+02:00 stderr F Wed Jan 19 12:34:36 2022: Registering Kernel netlink command channel
2022-01-19T14:34:36.652229459+02:00 stderr F Wed Jan 19 12:34:36 2022: Opening file '/var/lib/neutron/ha_confs/95f78563-cbd6-44f9-9778-dad2cdeda16a/keepalived.conf'.
2022-01-19T14:34:36.652414725+02:00 stderr F Wed Jan 19 12:34:36 2022: (/var/lib/neutron/ha_confs/95f78563-cbd6-44f9-9778-dad2cdeda16a/keepalived.conf: Line 21) Cannot specify scope for IPv6 addresses (fe80::200:19ff:fe39:fe80/64) - ignoring scope
2022-01-19T14:34:36.652455017+02:00 stderr F Wed Jan 19 12:34:36 2022: (VR_170) Ignoring track_interface ha-2cf61505-6a since own interface
2022-01-19T14:34:36.652473611+02:00 stderr F Wed Jan 19 12:34:36 2022: Assigned address 169.254.192.64 for interface ha-2cf61505-6a
2022-01-19T14:34:36.652636252+02:00 stderr F Wed Jan 19 12:34:36 2022: Registering gratuitous ARP shared channel
2022-01-19T14:34:36.652652715+02:00 stderr F Wed Jan 19 12:34:36 2022: Registering gratuitous NDISC shared channel
2022-01-19T14:34:36.652661216+02:00 stderr F Wed Jan 19 12:34:36 2022: (VR_170) removing VIPs.
2022-01-19T14:34:36.652737224+02:00 stderr F Wed Jan 19 12:34:36 2022: (VR_170) removing E-VIPs.
2022-01-19T14:34:36.653112228+02:00 stderr F Wed Jan 19 12:34:36 2022: (VR_170) Entering BACKUP STATE (init)
2022-01-19T14:34:36.653124612+02:00 stderr F Wed Jan 19 12:34:36 2022: VRRP sockpool: [ifindex( 31), family(IPv4), proto(112), fd(12,13)]
2022-01-19T14:34:43.458197116+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) Receive advertisement timeout
2022-01-19T14:34:43.458429670+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) Entering MASTER STATE
2022-01-19T14:34:43.458440722+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) setting VIPs.
2022-01-19T14:34:43.458904997+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) setting E-VIPs.
2022-01-19T14:34:43.458991975+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) Sending/queueing gratuitous ARPs on ha-2cf61505-6a for 169.254.0.170
2022-01-19T14:34:43.459011300+02:00 stderr F Wed Jan 19 12:34:43 2022: Sending gratuitous ARP on ha-2cf61505-6a for 169.254.0.170
2022-01-19T14:34:43.459177250+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) Sending/queueing gratuitous ARPs on qr-cd37f0e5-4c for 11.11.11.1
2022-01-19T14:34:43.459199075+02:00 stderr F Wed Jan 19 12:34:43 2022: Sending gratuitous ARP on qr-cd37f0e5-4c for 11.11.11.1
2022-01-19T14:34:43.459274329+02:00 stderr F Wed Jan 19 12:34:43 2022: (VR_170) Sending/queueing Unsolicited Neighbour Adverts on qr-cd37f0e5-4c for fe80::200:19ff:fe39:fe80
2022-01-19T14:34:43.459421080+02:00 stderr F *** buffer overflow detected ***: /usr/sbin/keepalived terminated
2022-01-19T14:34:43.811687544+02:00 stderr F Wed Jan 19 12:34:43 2022: pid 787439 exited due to signal 6 (Aborted)
2022-01-19T14:34:43.811768205+02:00 stderr F Wed Jan 19 12:34:43 2022: VRRP child process(787439) died: Respawning
2022-01-19T14:34:43.811777309+02:00 stderr F Wed Jan 19 12:34:43 2022: Restart of VRRP process delayed 60 seconds to limit respawn rate

CentOS-Stream 8

eb 14 12:19:03 host-11-11-11-22 Keepalived[31884]: Starting VRRP child process, pid=31902
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Registering Kernel netlink reflector
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Registering Kernel netlink command channel
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: (/etc/keepalived/keepalived.conf: Line 20) Cannot specify scope for IPv6 addresses (fe80::200:16ff:fe73:fe80/64) - ignoring scope
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) Ignoring track_interface ib2.801f since own interface
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Assigned address 169.254.195.40 for interface ib2.801f
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Assigned address fe80::1e34:da03:4d:76b6 for interface ib2.801f
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Registering gratuitous ARP shared channel
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: Registering gratuitous NDISC shared channel
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) removing VIPs.
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) removing E-VIPs.
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) Entering BACKUP STATE (init)
Feb 14 12:19:03 host-11-11-11-22 Keepalived_vrrp[31902]: VRRP sockpool: [ifindex( 12), family(IPv4), proto(112), fd(12,13)]
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) Receive advertisement timeout
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) Entering MASTER STATE
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) setting VIPs.
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) setting E-VIPs.
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) Sending/queueing gratuitous ARPs on ib2.801f for 169.254.0.217
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: Sending gratuitous ARP on ib2.801f for 169.254.0.217
Feb 14 12:19:10 host-11-11-11-22 Keepalived_vrrp[31902]: (VR_217) Sending/queueing Unsolicited Neighbour Adverts on ib2.801e for fe80::200:16ff:fe73:fe80
Feb 14 12:19:10 host-11-11-11-22 systemd[1]: Started Process Core Dump (PID 31903/UID 0).
Feb 14 12:19:10 host-11-11-11-22 systemd-coredump[31904]: Resource limits disable core dumping for process 31902 (keepalived).
Feb 14 12:19:10 host-11-11-11-22 systemd-coredump[31904]: Process 31902 (keepalived) of user 0 dumped core.
Feb 14 12:19:10 host-11-11-11-22 systemd[1]: systemd-coredump@26-31903-0.service: Succeeded.
Feb 14 12:19:10 host-11-11-11-22 Keepalived[31884]: pid 31902 exited due to signal 6 (Aborted)
Feb 14 12:19:10 host-11-11-11-22 Keepalived[31884]: VRRP child process(31902) died: Respawning
Feb 14 12:19:10 host-11-11-11-22 Keepalived[31884]: Restart of VRRP process delayed 2 seconds to limit respawn rate
Feb 14 12:19:12 host-11-11-11-22 Keepalived[31884]: Starting VRRP child process, pid=31910
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Registering Kernel netlink reflector
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Registering Kernel netlink command channel
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: (/etc/keepalived/keepalived.conf: Line 20) Cannot specify scope for IPv6 addresses (fe80::200:16ff:fe73:fe80/64) - ignoring scope
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: (VR_217) Ignoring track_interface ib2.801f since own interface
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Assigned address 169.254.195.40 for interface ib2.801f
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Assigned address fe80::1e34:da03:4d:76b6 for interface ib2.801f
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Registering gratuitous ARP shared channel
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: Registering gratuitous NDISC shared channel
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: (VR_217) removing VIPs.
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: (VR_217) removing E-VIPs.
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: (VR_217) Entering BACKUP STATE (init)
Feb 14 12:19:12 host-11-11-11-22 Keepalived_vrrp[31910]: VRRP sockpool: [ifindex( 12), family(IPv4), proto(112), fd(12,13)]

Did keepalived coredump? yes. core_backtrace collected by abrt-ccpp.service

{   "signal": 6
,   "executable": "/usr/sbin/keepalived"
,   "only_crash_thread": true
,   "stacktrace":
      [ {   "crash_thread": true
        ,   "frames":
              [ {   "address": 140021711800911
                ,   "build_id": "31172047330f2a16352e53539fef25e33f53f091"
                ,   "build_id_offset": 322127
                ,   "function_name": "raise"
                ,   "file_name": "/usr/lib64/libc-2.28.so"
                }
              , {   "address": 140021711617662
                ,   "build_id": "31172047330f2a16352e53539fef25e33f53f091"
                ,   "build_id_offset": 138878
                ,   "function_name": "abort"
                ,   "file_name": "/usr/lib64/libc-2.28.so"
                }
              , {   "address": 140021712072791
                ,   "build_id": "31172047330f2a16352e53539fef25e33f53f091"
                ,   "build_id_offset": 594007
                ,   "function_name": "__libc_message"
                ,   "file_name": "/usr/lib64/libc-2.28.so"
                }
              , {   "address": 140021712785941
                ,   "build_id": "31172047330f2a16352e53539fef25e33f53f091"
                ,   "build_id_offset": 1307157
                ,   "function_name": "__GI___fortify_fail_abort"
                ,   "file_name": "/usr/lib64/libc-2.28.so"
                }
              , {   "address": 140021712785991
                ,   "build_id": "31172047330f2a16352e53539fef25e33f53f091"
                ,   "build_id_offset": 1307207
                ,   "function_name": ".annobin___GI___fortify_fail.end"
                ,   "file_name": "/usr/lib64/libc-2.28.so"
                }
              , {   "address": 140021712777958
                ,   "build_id": "31172047330f2a16352e53539fef25e33f53f091"
                ,   "build_id_offset": 1299174
                ,   "function_name": ".annobin___GI___chk_fail.end"
                ,   "file_name": "/usr/lib64/libc-2.28.so"
                }
              , {   "address": 94187811277540
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 309988
                ,   "function_name": "ndisc_send_unsolicited_na_immediate"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811209332
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 241780
                ,   "function_name": "vrrp_send_update"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811212006
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 244454
                ,   "function_name": "vrrp_send_link_update.part.9"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811219299
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 251747
                ,   "function_name": "vrrp_state_master_tx"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811246540
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 278988
                ,   "function_name": "vrrp_read_dispatcher_thread"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811388254
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 420702
                ,   "function_name": "process_threads"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811178736
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 211184
                ,   "function_name": "start_vrrp_child"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811388254
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 420702
                ,   "function_name": "process_threads"
                ,   "file_name": "/usr/sbin/keepalived"
                }
              , {   "address": 94187811019359
                ,   "build_id": "7177358dd326f4f3566ca454c68c69f5bd4924be"
                ,   "build_id_offset": 51807
                ,   "function_name": "keepalived_main"
                ,   "file_name": "/usr/sbin/keepalived"
                } ]
        } ]
}

Additional context Add any other context about the problem here.

pqarmitage commented 2 years ago

@itailev Many thanks for reporting this. One of the problems with Infiniband is that I have no hardware to test it with, and I am not aware of any virtualised Infiniband drivers that I can test it with. The current support for Infiniband was done with a lot of help from Sadanand Warrier at Intel, but it was only done for IPv4, since that was all he was using.

The last keepalived function in the stack trace above is ndisc_send_unsolicited_na_immediate(). There is a comment in the source code of that function:

/* This needs updating to support IPv6 over Infiniband
         * (see vrrp_arp.c) */

so now seems the time to do it. :)

I will have a look later to see if I can work out what needs doing, and provide a patch in this issue report which I will be grateful if you could test it.

In the mean time, on the CentOS 8 system, if you could execute (assuming that the coredump is the latest one on your system) coredumpctl debug, and at the gdb> prompt enter bt, and post the output of that here, that generally gives more information that the abrt stacktrace (it should install a number of debuginfo rpm packages, including keepalived-debuginfo - if it doesn't install the keepalived-debuginfo info package could you manually install it and and run the coredumpctl debug/bt commands again.

I think we should be able to get this resolved quite quickly, but more details would be helpful.

pqarmitage commented 2 years ago

@itailev What would be really helpful would be a network capture of a valid IPv6 neighbour advertisement over Infiniband (perhaps in a .pcap file). It would make it more likely that what I produce should work.

pqarmitage commented 2 years ago

@itailev It would be really helpful if you could check something for me - is it possible to create a macvlan interface (vmac in Keepalived parlance) on an Infiniband interface? If it is possible, does it make sense to do so (so far as I can see from the kernel sources the MAC address of the macvlan interface will be 6 octets)?

The reason for asking this is that keepalived either needs to report a configuration error if a config attempts to configure a macvlan on an Infiniband interface, or it needs to handle it properly (which it doesn't do so at the moment).

Unfortunately I don't think I can test this sort of thing because I can't see any way to create a dummy Infiniband interface.

itailev commented 2 years ago

@pqarmitage thanks for picking it up. I had an issue with the setup, will rebuild it, collect your info and check

itailev commented 2 years ago

will try to do the macvlan check, however in any case, OpenStack is not using macvlan when it sets the IB interfaces...

pqarmitage commented 2 years ago

@itailev Many thanks for doing the check re macvlan. I understand that you are not using macvlans, but I want to make sure that while fixing some of the Infiniband code in keepalived we also fix any other potential issues that we are aware of that someone else might come across.

When testing whether macvlans can be configured if you could try both:

  1. ip link add link ib2.801f macvlan0 type macvlan mode private
  2. Add an extra line to your keepalived configuration use_vmac vmac_over_ib in the vrrp_instance block, and check the keepalived logs to see if there is any error (and I suppose check (while keepalived is running) if an interface vmac_over_ib has been created). If vmac_over_ib has been created, the output of ip -d link show vmac_over_ib; ip -d link show ib2.801f would be interesting.
pqarmitage commented 2 years ago

@itailev Attached is the patch that should stop the segfault you have been experiencing, and also should create NA messages correctly over Infiniband 010-na_over_infiniband.patch.txt. This patch applies to keepalived v2.2.7. If you want to remain with keepalived v2.1.5, then use this patch 010-na_over_infiniband.patch.215.txt which incorporates commit 1b3f08a and one or two other trivial changes to make it compile.

I really have no idea whether the NA messages over Infiniband are correctly formatted, so it would be extremely helpful if you could check. If the format of the message is wrong, then if you attach a correctly formatted NA over Infiniband message(e.g. wireshark output or a .pcap file), that would be really helpful.

itailev commented 2 years ago

@pqarmitage thanks for the quick patch! seems to work. not sure about the full functionality, however no crash and I see the NA messages:

eb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Starting Keepalived v2.1.5 (07/13,2020), git commit v2.1.5+
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Running on Linux 4.18.0-358.el8.x86_64 #1 SMP Mon Jan 10 13:11:20 UTC 2022 (built for Linux 4.18.0)
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Command line: '/usr/local/sbin/keepalived' '-D'
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37714]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37714]: Starting VRRP child process, pid=37715
Feb 17 11:18:20 host-11-11-11-41 systemd[1]: Started LVS and VRRP High Availability Monitor.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering Kernel netlink reflector
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering Kernel netlink command channel
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (/etc/keepalived/keepalived.conf: Line 21) Cannot specify scope for IPv6 addresses (fe80::200:16ff:fe73:fe80/64) - ignoring scope
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Ignoring track_interface ib0.8047 since own interface
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Assigned address 169.254.195.40 for interface ib0.8047
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering gratuitous ARP shared channel
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering gratuitous NDISC shared channel
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) removing VIPs.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) removing E-VIPs.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Entering BACKUP STATE (init)
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: VRRP sockpool: [ifindex(  8), family(IPv4), proto(112), fd(12,13)]
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Receive advertisement timeout
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Entering MASTER STATE
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) setting VIPs.
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) setting E-VIPs.
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Sending/queueing gratuitous ARPs on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Sending/queueing gratuitous ARPs on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Sending/queueing Unsolicited Neighbour Adverts on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
# tcpdump -en -i ib0.8048
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ib0.8048, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
11:24:42.865833 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.865908 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.865940 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.865957 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.865985 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.866001 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.866030 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.866046 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.866073 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.866091 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.866164 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
11:24:42.868585 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
11:24:43.499567 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
11:24:43.819576 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
itailev commented 2 years ago

@pqarmitage as requested, I did the macvlan test. as you can see, its not possible to set macvlan over IB interface. getting error for the manual ip command and the keepalived logs indicate the same limitation:

Feb 17 11:34:59 host-11-11-11-41 systemd[1]: Started LVS and VRRP High Availability Monitor.
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering Kernel netlink reflector
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering Kernel netlink command channel
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (/etc/keepalived/keepalived.conf: Line 22) Cannot specify scope for IPv6 addresses (fe80::200:16ff:fe73:fe80/64) - ignoring scope
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217): vmacs are only supported on Ethernet type interfaces
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217) Ignoring track_interface ib0.8047 since own interface
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Assigned address 169.254.195.40 for interface ib0.8047
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering gratuitous ARP shared channel
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering gratuitous NDISC shared channel
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217) removing VIPs.
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217) removing E-VIPs.
pqarmitage commented 2 years ago

@itailev Would it be possible for you to capture the packets to a file and then post that file (I would be quite happy if the capture file contained only the IPv6 NA packets). I would like to look at all the headers to make sure that everything is as expected; unfortunately your tcpdump output, despite using -e does not show the layer 2 information for the IPv6 packets.

Many thanks for testing the macvlan configuration. It looks as though we already had a check in keepalived for this, although I had completely forgotten about it.

Once I have seen a full packet decode of the NA packets over Infiniband (if that is possible) I will merge the patch, but it will be into the current (v2.2.7+) code. You will either need to apply the v2.1.5 patch yourself, or upgrade to v2.2.7+ (I would recommend the latter since it appears to be extremely stable).

itailev commented 2 years ago

ipv6na.zip

There you go @pqarmitage

itailev commented 2 years ago

Thanks again for your support

pqarmitage commented 2 years ago

Commit b5d8aed resolves this issue. @itailev Many thanks for your help.

itailev commented 1 year ago

@pqarmitage - I see that keepalived version in centos repo is 2.2.4-1 and does not contain this fix: https://centos.pkgs.org/9-stream/centos-appstream-aarch64/keepalived-2.2.4-1.el9.aarch64.rpm.html

how can we make sure the repo is updated with the 2.2.7 version with the fix?

pqarmitage commented 1 year ago

Unfortunately which version of keepalived the distro maintainers choose to include is beyond the scope of what the keepalived project can control.

My understanding of Centos Stream is that it include package updates that are intended to be merged into RHEL in the near future. I think that the only way to get the keepalived version updated in Centos Stream would be to raise a bug in RHEL bugzilla against keepalived requesting a version upgrade due to the above bug fix.

itailev commented 1 year ago

Thanks @pqarmitage !