Closed ohrensessel closed 10 years ago
Hm, hm, this is pretty bad news... didn't know about that parameter before. Even if we were changing that parameter in gluon, the issue would still remain for casual Linux clients. Just checked, Debian Sid seems to have the same defaults, and because these are the defaults in the Linux kernel, I believe maybe any Linux distro could have problems despite laptops etc. having more than enough RAM.
Looking at our graph here in Lübeck, with ~230 nodes we are currently peaking at 540 clients:
http://map.ffhl/nodes/globalGraph.png
So it seems to make sense, that Hamburg with its 400+ nodes is now hitting this 1024 clients limit (.gc_thresh3). So it's probably not a malicious DoS attack there in Hamburg.
Even if we were increasing the default value upstream in the Linux kernel that might take some time to trickle down to all these distros (the fastest way would be to declare an increase as a fix, therefore being picked up by the stable kernel releases soon - but that might be difficult to convince the Linux netdev folks). I hate to say that.. but maybe you guys need to add subnets... Damn it, I totally didn't see that limit coming...
You/We should start some discussion somewhere on how to deal with this serious issue.
I think we firstly should look into the consequences of such an overflow. Is that really a problem? From my point of view the clients mostly communicate with a restricted set of other hosts, e.g. gateways or other hosts/nodes/clients offering services. Therefore, they only need the IP to MAC assignment of these hosts, or am I wrong here? An overflowing neighbour table would only cause additional traffic by ARP requests if these specific hosts are dropped from the table. That shouldn't happen (ideally) if they communicate frequently with these hosts?
I had a quick search regarding proxying ARP requests, as this came into my mind. But, what wasn't clear for me, an ARP proxy only answers with its own MAC, not with MACs in his neighbour table. So having an ARP proxy on the freifunk routers is not a solution to reduce traffic by additional ARP requests caused by neighbour table overflow on the clients.
But, as I said above, let's first see whether an overflow in the clients really is a problem or only is relevant if a client also communicates with a lot of other hosts (more than the threshold of the neighbour table, probably an infrequent case).
Okay, checking again, I don't quite see yet, why this overflow exists in your network: Since ARP replies are supposed to be sent via unicast, a host should only end up having entries in its arp/neighbor table for hosts it is actually communicating with. For instance for a host at Freifunk Lübeck (krtek.ffhl) I'm getting this:
$ ip -4 neigh show dev freifunk | wc -l 8
What's your output on the according node for that when the "ipv4: Neighbour table overflow." occurs? Could you provide a tcpdump of ARP packets at that time?
I second @T-X. In my world view, this shouldn't happen except on the server contacted by all clients or something. I think I remember that this table contains many more entries than ip neigh
reveals, so that might not be an appropriate indicator.
This is quite related to my thought of a solution: the node shouldn't need neighbor entries for foreign subnets. batman can choose a gateway based on link metrics. The gateways can distribute addresses from different subnets. Can't the nodes prevent the ARP requests for non-chosen subnets from reaching the clients? (ebtables magic)
I suspect there might be an issue specific to the network in Hamburg causing this... maybe a broken ARP proxy or something? Dumping and analyzing a few minutes of ARP traffic would give us something to work with, so far we don't know at all what is causing the "problem".
Here are two dumps.
http://www.ohrensessel.net/overflow.pcap was captured yesterday at around 16:50 o'clock when overflow messages appeared in the log.
http://www.ohrensessel.net/nooverflow.pcap was captured right now with no sign of overflow messages.
Please note that these come from different times of the day and therefore might have a different "traffic" volume.
I only had a quick look, but could not find anything particular noticeable. Maybe I can have a deeper look later today.
These were captured by "tcpdump -n -i br-client arp"
Two things I noticed in this dump:
@ohrensessel: Could you have a look whether an additional ebtables rule like this silcences the overflow warning on your gluon node?
I think the gratitous ARP replies are sent after connecting to a network, and android devices that are moving might reconnect quite a lot... We should investigate if these devices are responding with normal unicast ARP replies to ARP requests.
Ok, that are some interesting findings. I did not have time yet to look into the traces, but will do it this evening probably.
One additional information: During times where overflow messages occur,
ip -4 neigh | wc -l
yields only 20 to 40 entries. (This is a quite busy node with often 10 or slightly more users + 2 other nodes reachable via wifi.) However, I suspected much more to trigger the overflow warning.
Are we maybe seeing something similar to http://wiki.wireshark.org/Gratuitous_ARP here? The link describes a valid use for broadcasting ARP replies and responses, but in general the ARP response as follow-up to a request should be unicast.
I think your solution (substituting the target mac) is a little bit to "hackish". This would require some kind of transparent "ARP proxy". Maybe doing some stateful filtering of ARP responses would be better here? So that only devices which sent out the ARP request do get the reply in the end. So if a node has not seen the outgoing ARP request it will drop the associated reply, even when it is send to the broadcast address.
Regarding your latest comment that you submitted right before I submitted mine: From the traces it is visible that the gratious ARP response from the Android devices is a follow-up to an ARP request received shortly. That doesn't look like they are doing gratious replies because of moving from node to node or something like that.
From what I've read so far, these are not gratious ARP responses since the ARP sender IP differs from the ARP target IP. Therefore I think they are simply malformed, "standard" ARP responses.
@T_X, can you elaborate a bit more on how these ARP replies look?
I am getting this error message on my wired (VPN-mesh) node only. No such messages on an air-mesh node.
And I am not experiencing these log messages on a raspberry pi which is attached to my wired node as far as I can tell. I will continue observing nevertheless.
I am seeing the same messages pre-gluon.
It's been quiet here for a few months yet lots of nodes seem to be pretty stable (except for ath9k woes) running Gluon. Is this still an issue? Does it cause problems?
The messages are still appearing on the nodes. I don't know if these messages imply any instability or not.
They are probably realted to a kernel bug where the thresholds are in fact only half of the value that is set. See http://patchwork.ozlabs.org/patch/170385/ With BB this problem should be fixed, as a newer kernel is used.
Some of the nodes seeing the overflow messages reboot very frequently, but this is rather related to OOMs which is a totally different bug...
It seems as if the messages are gone with v2014.3-23-gddd7c16 which is the first BB experimental here in Hamburg. Normally these messages appeared after less uptime (~14 hours so far). I will keep an eye on that to finally confirm that the messages do not appear anymore.
With ~70h uptime, still no appearance of neighbour overflow messages. I would say that this problem is gone with BB, at least for the network size we have now.
Closing this as it is fixed with BB.
Gluon node (ffhh) has a lot of "ipv4: Neighbour table overflow." messages in the log.
This possibly can be fixed by the sysctl settings net.ipv4.neigh.default.gc_thresh1 net.ipv4.neigh.default.gc_thresh2 net.ipv4.neigh.default.gc_thresh3
and for IPv6 (no message regarding that so far but might be good to change it nevertheless) net.ipv6.neigh.default.gc_thresh1 net.ipv6.neigh.default.gc_thresh2 net.ipv6.neigh.default.gc_thresh3
I do not know anything about the memory implications of these changes, but there might be some.