Closed wienczny closed 8 years ago
I've had a look at this, and can't see any reason why it should be happening.
You say "when running keepalived for some time"; how long is that (in orther words is it one entry added per day, per week, per hour or whatever)? You have 16 identical rules, do you know if they were all added by the same invocation of keepalived, or might some of them be from previous invocations?
How many VRRP instances do you have in your configuration, and how many of them have virtual_rules?
Have there been failovers to a secondary keepalived instance, and is there any correlation between the number of failovers and the number of ip rules? Or have there been any interfaces going down and coming back up? Is there anything in the keepalived logs or system log that might suggest what has happened?
Which version of keepalived are you running, and is it a release version or a git snapshot?
I'm currently using an old git snapshot (aa3d2584f5984b3c01190d47ce91dcd55220a100). The duplicated rules are created whenever I send a SIGHUP to keepalived. The time does not matter. I did not see that correlation before. The duplicate is created for every entry in virtual_rules. My hypothesis is that the rules from the kernel are not correctly matched against the new one which causes a new one to be created. The configuration currently contains a single VRRP instance and two rules. The error occured with a single rule, too. keepalived is currently run on a single machine, so there should be no failovers. There are no changes to the interfaces.
Thanks for the update.
Although I couldn't reproduced the issue using SIGHUP, I could reproduce it with SIGKILL sent to the keepalived_vrrp child process (I was using git snapshot 9896eed ).
I have added a patch at https://github.com/pqarmitage/keepalived/tree/issue%23246 which resolves the issue when using SIGKILL. Would you be able to test this to see if it resolves your issue?
If it doesn't resolve the issue for you, would you be able to run keepalived adding the -D flag, and then attach the contents of the log generated to this issue report.
I build your branch and relaunched keepalived. I expected the old duplicated rules to be deleted when I relaunched keepalived. This did not happen. Sending SIGHUP does not increase the number of rules any more (y).
The log is saying this:
Mar 08 11:30:27 master1[31584]: Registering Kernel netlink reflector
Mar 08 11:30:27 master1[31584]: Registering Kernel netlink command channel
Mar 08 11:30:27 master1[31584]: Registering gratuitous ARP shared channel
Mar 08 11:30:27 master1[31584]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 08 11:30:27 master1[31584]: VRRP_Instance(vips) setting protocol VIPs.
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433028, pid=0
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433029, pid=0
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433030, pid=0
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433031, pid=0
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433032, pid=0
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433033, pid=0
Mar 08 11:30:27 master1[31584]: Netlink: error: No such device, type=(20), seq=1457433034, pid=0
Just for reference. These are the build configuration values
Keepalived configuration
------------------------
Keepalived version : 1.2.19
Compiler : gcc
Compiler flags : -g -O2 -I/usr/include/libnl3 -I/usr/include/libnl3
Extra Lib : -lssl -lcrypto -lcrypt -lnl-genl-3 -lnl-3 -lnl-route-3 -lnl-3
Use IPVS Framework : Yes
IPVS sync daemon support : Yes
IPVS use libnl : Yes
fwmark socket support : Yes
Use VRRP Framework : Yes
Use VRRP VMAC : Yes
Use VRRP authentication : Yes
SNMP keepalived support : No
SNMP checker support : No
SNMP RFCv2 support : No
SNMP RFCv3 support : No
SHA1 support : No
Use Debug flags : No
libnl version : 3
Use IPv4 devconf : Yes
Use libiptc : No
Use libipset : No
What are libiptc and libipset used for?
The way the deletion of old rules/routes is handled is is that it deletes just ONE set of rules and routes if they exist. Once running a keepalived with this patch, since each time keepalived starts it deletes the last set of rules/rules created, multiple entries can never build up; so if you stop keepalived, remove any old rules remaining, and then start keepalived, it should work as you want (alternatively while keepalived is running delete all but one of the rules, and it should then work with no new duplications).
I'll push this patch upstream, now that you have confirmed that it works as intended, many thanks.
Did you get the netlink error messages specifying "no such device"? prior to applying this patch or has this patch introduced those. The reason I ask is that keepalived now has to speculatively attempt to remove the old rules/routes, and I've added code to suppress error messages in the case that they don't exist, but of course I may not have got that quite right.
You ask about libiptc and libipset. Commit 7ec7c8d added support for accept mode for VRRPv3, or more to the point added support for non-accept mode, which requires blocking incoming packets to the virtual IP addresses. The way this was implemented was by invoking the iptables/ip6tables command for each entry to be added or deleted from ip(6)tables, and each time ip(6)tables is invoked it has to read all the iptables data from the kernel. If there are a large number of entries in the ip(6)tables configuration, this can create quite an overhead and delay. libiptc is a library for directly accessing the iptables configuration, and multiple changes can be made in one update, thereby significantly improving performance, and avoids the overhead of forking/execing ip(6)tables.
libipset is the library for directly accessing the ipset configuration. ipsets is designed to very efficiently handle lists of ip addresses, ports, mac addresses and interfaces in various combinations, using hashs or bitmaps. Also, adding and deleting entries from ipsets is much more efficient than adding/deleting entries from ip(6)tables.
So, if the development libraries are installed for iptables it will use the library rather than the ip(6)tables command, and if the ipset development libraries are also installed, it will use ipsets rather than adding lists of ip addresses into iptables (I didn't implement using ipsets via the ipset command, but it could be done).
If the development libraries are installed, but one doesn't want to use them, then --disable-libiptc and --disable-libipset can be specified to configure.
Just in case it is of use in the future, keepalived -v now shows all the build options used, so it doesn't matter if you miss capturing the output of configure.
Thanks for you explanation. The Netlink errors have been in yesterdays log already. Your changes did not introduce them.
When running keepalived with virtual_rules for some time duplicated rules are created:
Matching of existing rules is not (yet) reliable.