Closed pmelange closed 2 weeks ago
"via 0.0.0.0" looks strange...
is there a node that pretends to by 0.0.0.0 ?
Henning
On Mon, Nov 26, 2018 at 4:29 PM pmelange notifications@github.com wrote:
This is a copy of issue freifunk-berlin/firmware#628 https://github.com/freifunk-berlin/firmware/issues/628
On 22.11.2019 all of the routers on the entire Berlin Backbone started printing the following error messages, repeating every second.
Thu Nov 22 13:53:08 2018 daemon.info olsrd[7015]: Received netlink error code Invalid argument (-22) Thu Nov 22 13:53:08 2018 daemon.err olsrd[7015]: . error: del route to171.159.48.121/254.0.0.0 via 0.0.0.0 dev void onlink (Resource temporarily unavailable 11) Thu Nov 22 13:53:08 2018 daemon.err olsrd[7015]: Delete route171.159.48.121/7 via 0.0.0.0: Resource temporarily unavailable
The only known methods to stop the error messages was to restart the OLSR4 service or to reboot the router.
This also effected every router attach to the BBB-VPN.
Every version of the OLSR daemon was hit by this problem. From 0.6.x to the latest 0.9.6.2
The cause of this message is unknown.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OLSR/olsrd/issues/66, or mute the thread https://github.com/notifications/unsubscribe-auth/AG8Ytx0E5cVr4JPWDHT3MA4sfyNbECNcks5uzAjegaJpZM4YzQmf .
Yes, very strange. No, I don't think that any node on the mesh network would announce 0.0.0.0. Also the 171.159.48.121 address (with a huge netmask) is strange. All the nodes on our mesh network are in the 10.0.0.0/8 address space. And no nodes should state that they have a /8 netmask either.
Take a look at the email thread on the freifunk-berlin mailing list. https://lists.berlin.freifunk.net/pipermail/berlin/2018-November/038406.html
It sounds like one of the nodes introduce wrong/bad data into your network... olsrd does not make any consistency checks. On Wed, Dec 5, 2018 at 11:40 PM pmelange notifications@github.com wrote:
Yes, very strange. No, I don't think that any node on the mesh network would announce 0.0.0.0. Also the 171.159.48.121 address (with a huge netmask) is strange. All the nodes on our mesh network are in the 10.0.0.0/8 address space. And no nodes should state that they have a /8 netmask either.
Take a look at the email thread on the freifunk-berlin mailing list. https://lists.berlin.freifunk.net/pipermail/berlin/2018-November/038406.html
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I suppose it could be user error on some node. But the result is the same. Unfortunately I didn't check the routing tables to see if there is also a "via 0.0.0.0" entry.
If it is not possible for OLSR to delete a route, shouldn't OLSR handle it differently than repeatedly retrying to delete it?
Olsrd could not have set the route in the first case because it would throw an error.
Maybe its just a case of memory corruption, e.g. done by a plugin. We had something similar with a special version of the mdns plugin years ago.
I don't know.
On Thu, Dec 6, 2018 at 9:28 PM pmelange notifications@github.com wrote:
I suppose it could be user error on some node. But the result is the same. Unfortunately I didn't check the routing tables to see if there is also a "via 0.0.0.0" entry.
If it is not possible for OLSR to delete a route, shouldn't OLSR handle it differently than repeatedly retrying to delete it?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OLSR/olsrd/issues/66#issuecomment-445018570, or mute the thread https://github.com/notifications/unsubscribe-auth/AG8Yt7sAbYB-Xj0zvobaeSmnlULrnMAHks5u2X4EgaJpZM4YzQmf .
I guess it was some kind of mass hysteria memory corruption. I don't know how to repeat it.
Well, if it happens again (it might take a few years) shall I open another ticket?
On Thu, Dec 6, 2018 at 11:36 PM pmelange notifications@github.com wrote:
I guess it was some kind of mass hysteria memory corruption. I don't know how to repeat it.
Maybe you could ask if someone has installed a new/experimental plugin for their Olsrd... sometimes it is enough that ONE node in the mesh has installed something new to kill the whole mesh.
I have a small "consistency check" plugin for olsrd2, which could be expanded to filter for "bad addresses/prefixes".
Well, if it happens again (it might take a few years) shall I open another ticket?
If it happens again and you still remember this thread, please reopen this one.
Please also notice that olsrd(1) is without a maintainer...
After one week, there was no answer on the freifunk-berlin mailing list. Closing
This happened again today with a /4
HNA (errors above for a /7
). When that HNA was withdrawn/expires, its removal from the kernel routing table started looping with the errors mentioned above.
@PolynomialDivision could you reopen?
Hi @pktpls / @pmelange , anything we can do here (as the case is open for quite some time)?
I haven't seen this happen again and i don't know how to reproduce it.
Thank you. I‘ll close it here the moment, we can reopen it once someone catches the error again.
Hope this is fine for you as well.
This is a copy of issue https://github.com/freifunk-berlin/firmware/issues/628
On 22.11.2019 all of the routers on the entire Berlin Backbone started printing the following error messages, repeating every second.
The only known methods to stop the error messages was to restart the OLSR4 service or to reboot the router.
This also effected every router attach to the BBB-VPN.
Every version of the OLSR daemon was hit by this problem. From 0.6.x to the latest 0.9.6.2
The cause of this message is unknown.