Closed itailev closed 2 years ago
@itailev Many thanks for reporting this. One of the problems with Infiniband is that I have no hardware to test it with, and I am not aware of any virtualised Infiniband drivers that I can test it with. The current support for Infiniband was done with a lot of help from Sadanand Warrier at Intel, but it was only done for IPv4, since that was all he was using.
The last keepalived function in the stack trace above is ndisc_send_unsolicited_na_immediate()
. There is a comment in the source code of that function:
/* This needs updating to support IPv6 over Infiniband
* (see vrrp_arp.c) */
so now seems the time to do it. :)
I will have a look later to see if I can work out what needs doing, and provide a patch in this issue report which I will be grateful if you could test it.
In the mean time, on the CentOS 8 system, if you could execute (assuming that the coredump is the latest one on your system) coredumpctl debug
, and at the gdb>
prompt enter bt
, and post the output of that here, that generally gives more information that the abrt stacktrace (it should install a number of debuginfo rpm packages, including keepalived-debuginfo - if it doesn't install the keepalived-debuginfo info package could you manually install it and and run the coredumpctl debug
/bt
commands again.
I think we should be able to get this resolved quite quickly, but more details would be helpful.
@itailev What would be really helpful would be a network capture of a valid IPv6 neighbour advertisement over Infiniband (perhaps in a .pcap file). It would make it more likely that what I produce should work.
@itailev It would be really helpful if you could check something for me - is it possible to create a macvlan interface (vmac in Keepalived parlance) on an Infiniband interface? If it is possible, does it make sense to do so (so far as I can see from the kernel sources the MAC address of the macvlan interface will be 6 octets)?
The reason for asking this is that keepalived either needs to report a configuration error if a config attempts to configure a macvlan on an Infiniband interface, or it needs to handle it properly (which it doesn't do so at the moment).
Unfortunately I don't think I can test this sort of thing because I can't see any way to create a dummy Infiniband interface.
@pqarmitage thanks for picking it up. I had an issue with the setup, will rebuild it, collect your info and check
will try to do the macvlan check, however in any case, OpenStack is not using macvlan when it sets the IB interfaces...
@itailev Many thanks for doing the check re macvlan. I understand that you are not using macvlans, but I want to make sure that while fixing some of the Infiniband code in keepalived we also fix any other potential issues that we are aware of that someone else might come across.
When testing whether macvlans can be configured if you could try both:
use_vmac vmac_over_ib
in the vrrp_instance
block, and check the keepalived logs to see if there is any error (and I suppose check (while keepalived is running) if an interface vmac_over_ib
has been created). If vmac_over_ib
has been created, the output of ip -d link show vmac_over_ib; ip -d link show ib2.801f
would be interesting.@itailev Attached is the patch that should stop the segfault you have been experiencing, and also should create NA messages correctly over Infiniband 010-na_over_infiniband.patch.txt. This patch applies to keepalived v2.2.7. If you want to remain with keepalived v2.1.5, then use this patch 010-na_over_infiniband.patch.215.txt which incorporates commit 1b3f08a and one or two other trivial changes to make it compile.
I really have no idea whether the NA messages over Infiniband are correctly formatted, so it would be extremely helpful if you could check. If the format of the message is wrong, then if you attach a correctly formatted NA over Infiniband message(e.g. wireshark output or a .pcap file), that would be really helpful.
@pqarmitage thanks for the quick patch! seems to work. not sure about the full functionality, however no crash and I see the NA messages:
eb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Starting Keepalived v2.1.5 (07/13,2020), git commit v2.1.5+
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Running on Linux 4.18.0-358.el8.x86_64 #1 SMP Mon Jan 10 13:11:20 UTC 2022 (built for Linux 4.18.0)
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Command line: '/usr/local/sbin/keepalived' '-D'
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37713]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37714]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Feb 17 11:18:20 host-11-11-11-41 Keepalived[37714]: Starting VRRP child process, pid=37715
Feb 17 11:18:20 host-11-11-11-41 systemd[1]: Started LVS and VRRP High Availability Monitor.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering Kernel netlink reflector
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering Kernel netlink command channel
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (/etc/keepalived/keepalived.conf: Line 21) Cannot specify scope for IPv6 addresses (fe80::200:16ff:fe73:fe80/64) - ignoring scope
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Ignoring track_interface ib0.8047 since own interface
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Assigned address 169.254.195.40 for interface ib0.8047
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering gratuitous ARP shared channel
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: Registering gratuitous NDISC shared channel
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) removing VIPs.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) removing E-VIPs.
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Entering BACKUP STATE (init)
Feb 17 11:18:20 host-11-11-11-41 Keepalived_vrrp[37715]: VRRP sockpool: [ifindex( 8), family(IPv4), proto(112), fd(12,13)]
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Receive advertisement timeout
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Entering MASTER STATE
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) setting VIPs.
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) setting E-VIPs.
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Sending/queueing gratuitous ARPs on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Sending/queueing gratuitous ARPs on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: (VR_217) Sending/queueing Unsolicited Neighbour Adverts on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8047 for 169.254.0.217
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending gratuitous ARP on ib0.8048 for 10.10.10.1
Feb 17 11:18:27 host-11-11-11-41 Keepalived_vrrp[37715]: Sending unsolicited Neighbour Advert on ib0.8048 for fe80::200:16ff:fe73:fe80
# tcpdump -en -i ib0.8048
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ib0.8048, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
11:24:42.865833 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.865908 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.865940 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.865957 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.865985 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.866001 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.866030 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.866046 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.866073 Out ethertype ARP (0x0806), length 72: Request who-has 10.10.10.1 (00:ff:ff:ff:ff:12:40:1b:80:48:00:00:00:00:00:00:ff:ff:ff:ff) tell 10.10.10.1, length 56
11:24:42.866091 Out ethertype IPv6 (0x86dd), length 104: fe80::200:16ff:fe73:fe80 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::200:16ff:fe73:fe80, length 48
11:24:42.866164 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
11:24:42.868585 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
11:24:43.499567 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
11:24:43.819576 Out ethertype IPv6 (0x86dd), length 92: fe80::200:16ff:fe73:fe80 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
@pqarmitage as requested, I did the macvlan test. as you can see, its not possible to set macvlan over IB interface. getting error for the manual ip command and the keepalived logs indicate the same limitation:
Feb 17 11:34:59 host-11-11-11-41 systemd[1]: Started LVS and VRRP High Availability Monitor.
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering Kernel netlink reflector
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering Kernel netlink command channel
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Opening file '/etc/keepalived/keepalived.conf'.
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (/etc/keepalived/keepalived.conf: Line 22) Cannot specify scope for IPv6 addresses (fe80::200:16ff:fe73:fe80/64) - ignoring scope
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217): vmacs are only supported on Ethernet type interfaces
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217) Ignoring track_interface ib0.8047 since own interface
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Assigned address 169.254.195.40 for interface ib0.8047
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering gratuitous ARP shared channel
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: Registering gratuitous NDISC shared channel
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217) removing VIPs.
Feb 17 11:34:59 host-11-11-11-41 Keepalived_vrrp[37770]: (VR_217) removing E-VIPs.
@itailev Would it be possible for you to capture the packets to a file and then post that file (I would be quite happy if the capture file contained only the IPv6 NA packets). I would like to look at all the headers to make sure that everything is as expected; unfortunately your tcpdump output, despite using -e
does not show the layer 2 information for the IPv6 packets.
Many thanks for testing the macvlan configuration. It looks as though we already had a check in keepalived for this, although I had completely forgotten about it.
Once I have seen a full packet decode of the NA packets over Infiniband (if that is possible) I will merge the patch, but it will be into the current (v2.2.7+) code. You will either need to apply the v2.1.5 patch yourself, or upgrade to v2.2.7+ (I would recommend the latter since it appears to be extremely stable).
There you go @pqarmitage
Thanks again for your support
Commit b5d8aed resolves this issue. @itailev Many thanks for your help.
@pqarmitage - I see that keepalived version in centos repo is 2.2.4-1 and does not contain this fix: https://centos.pkgs.org/9-stream/centos-appstream-aarch64/keepalived-2.2.4-1.el9.aarch64.rpm.html
how can we make sure the repo is updated with the 2.2.7 version with the fix?
Unfortunately which version of keepalived the distro maintainers choose to include is beyond the scope of what the keepalived project can control.
My understanding of Centos Stream is that it include package updates that are intended to be merged into RHEL in the near future. I think that the only way to get the keepalived version updated in Centos Stream would be to raise a bug in RHEL bugzilla against keepalived requesting a version upgrade due to the above bug fix.
Thanks @pqarmitage !
Describe the bug Keepalived is crashing on a system with IPoIB networking stack whenever ipv6 entry is used in its config file. The crash happens while Sending/queueing Unsolicited Neighbour Adverts on the ipv6 address.
When removing the ipv6 address entry from the conf file, keepalived is not crashing.
To Reproduce Create the interfaces:
Prepare /etc/keepalived/keepalived.conf file:
restart keepalived service.
Expected behavior keepalived should not crash.
Keepalived version
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS) in Openstack its running in a container, however the issue is happening on bare metal server as well.
Interfaces configuration:
System Log entries Openstack:
CentOS-Stream 8
Did keepalived coredump? yes. core_backtrace collected by abrt-ccpp.service
Additional context Add any other context about the problem here.