Closed SvenRoederer closed 7 years ago
Could you bisect?
i will also make a large test with running route-count in RRD in all nodes
what i can also see with 0.9.6:
root@b:~ wget -O /dev/null http://127.0.0.1:2006
Downloading 'http://127.0.0.1:2006'
Connecting to 127.0.0.1:2006
(null) 0 - stalled -
Connection reset prematurely
(the same for /all or /lin) - IMHO this was already fixed? the correct filename is configured:
root@EG-bastianHQ:~ :) cat /var/etc/olsrd.conf
DebugLevel 0
AllowNoInt yes
ClearScreen no
IpVersion 4
FIBMetric "flat"
Willingness 7
TcRedundancy 2
LinkQualityFishEye 1
LinkQualityAlgorithm "etx_ffeth"
MprCoverage 7
MainIp 10.63.222.1
Hna4
{
10.63.222.0 255.255.255.192
100.65.220.0 255.255.255.0
}
LoadPlugin "olsrd_txtinfo.so.1.1"
{
PlParam "accept" "0.0.0.0"
PlParam "port" "2006"
}
LoadPlugin "olsrd_nameservice.so.0.4"
{
PlParam "name" "EG-bastianHQ"
PlParam "name-change-script" "/etc/udhcpc.user"
}
Interface "eth0.2"
{
Ip4Broadcast 255.255.255.255
Mode "ether"
HelloInterval 3.0
HelloValidityTime 125.0
TcInterval 2.0
TcValidityTime 500.0
MidInterval 25.0
MidValidityTime 500.0
HnaInterval 10.0
HnaValidityTime 125.0
}
Interface "eth0.1"
{
Ip4Broadcast 255.255.255.255
Mode "ether"
HelloInterval 3.0
HelloValidityTime 125.0
TcInterval 2.0
TcValidityTime 500.0
MidInterval 25.0
MidValidityTime 500.0
HnaInterval 10.0
HnaValidityTime 125.0
}
Interface "wlan0"
{
Ip4Broadcast 255.255.255.255
Mode "mesh"
HelloInterval 3.0
HelloValidityTime 125.0
TcInterval 2.0
TcValidityTime 500.0
MidInterval 25.0
MidValidityTime 500.0
HnaInterval 10.0
HnaValidityTime 125.0
}
Interface "wlan1"
{
Ip4Broadcast 255.255.255.255
Mode "mesh"
HelloInterval 3.0
HelloValidityTime 125.0
TcInterval 2.0
TcValidityTime 500.0
MidInterval 25.0
MidValidityTime 500.0
HnaInterval 10.0
HnaValidityTime 125.0
}
running the same version and I see:
# wget -O /dev/null http://127.0.0.1:2006
converted 'http://127.0.0.1:2006' (ANSI_X3.4-1968) -> 'http://127.0.0.1:2006' (UTF-8)
--2017-02-07 11:29:32-- http://127.0.0.1:2006/
Connecting to 127.0.0.1:2006... connected.
HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9
Length: unspecified
Saving to: '/dev/null'
/dev/null [ <=> ] 22.33K --.-KB/s in 0s
2017-02-07 11:29:32 (126 MB/s) - '/dev/null' saved [22868]
does that node have neighbours?
@fhuberts yes, a lot of neighbours - wired and wireless - i understand that it only works with netcat and not wget - a change my code for this - so: everything is fine - sorry for the noise
it works alright with wget, at least it should
if it doesn't work with wget can you send me the packet grab of the request? That would mean the request parsing doesn't work properly, especially if netcat works just fine (both should work).
What doesn't work is a (manual) telnet connection
it looks like this (captured on the laptop, querying a router with GNU wget - this works - but on the router itself the wget does seem to know http 0.9)
bastian@X301-II ~ $ sudo tcpdump -nXi wlan0 host 10.63.222.33
[sudo] password for bastian:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wlan0, link-type EN10MB (Ethernet), capture size 65535 bytes
12:45:07.139704 IP 100.66.3.186.34156 > 10.63.222.33.2006: Flags [S], seq 3109579389, win 29200, options [mss 1460,sackOK,TS val 57238513 ecr 0,nop,wscale 7], length 0
0x0000: 4500 003c 098c 4000 4006 e0d3 6442 03ba E..<..@.@...dB..
0x0010: 0a3f de21 856c 07d6 b958 6a7d 0000 0000 .?.!.l...Xj}....
0x0020: a002 7210 6d20 0000 0204 05b4 0402 080a ..r.m...........
0x0030: 0369 63f1 0000 0000 0103 0307 .ic.........
12:45:07.143447 IP 10.63.222.33.2006 > 100.66.3.186.34156: Flags [S.], seq 590398867, ack 3109579390, win 28960, options [mss 1460,sackOK,TS val 339576 ecr 57238513,nop,wscale 4], length 0
0x0000: 4500 003c 0000 4000 3f06 eb5f 0a3f de21 E..<..@.?.._.?.!
0x0010: 6442 03ba 07d6 856c 2330 c593 b958 6a7e dB.....l#0...Xj~
0x0020: a012 7120 56c1 0000 0204 05b4 0402 080a ..q.V...........
0x0030: 0005 2e78 0369 63f1 0103 0304 ...x.ic.....
12:45:07.143498 IP 100.66.3.186.34156 > 10.63.222.33.2006: Flags [.], ack 1, win 229, options [nop,nop,TS val 57238514 ecr 339576], length 0
0x0000: 4500 0034 098d 4000 4006 e0da 6442 03ba E..4..@.@...dB..
0x0010: 0a3f de21 856c 07d6 b958 6a7e 2330 c594 .?.!.l...Xj~#0..
0x0020: 8010 00e5 f5c4 0000 0101 080a 0369 63f2 .............ic.
0x0030: 0005 2e78 ...x
12:45:07.143926 IP 100.66.3.186.34156 > 10.63.222.33.2006: Flags [P.], seq 1:119, ack 1, win 229, options [nop,nop,TS val 57238514 ecr 339576], length 118
0x0000: 4500 00aa 098e 4000 4006 e063 6442 03ba E.....@.@..cdB..
0x0010: 0a3f de21 856c 07d6 b958 6a7e 2330 c594 .?.!.l...Xj~#0..
0x0020: 8018 00e5 7a4a 0000 0101 080a 0369 63f2 ....zJ.......ic.
0x0030: 0005 2e78 4745 5420 2f6c 696e 2048 5454 ...xGET./lin.HTT
0x0040: 502f 312e 310d 0a55 7365 722d 4167 656e P/1.1..User-Agen
0x0050: 743a 2057 6765 742f 312e 3135 2028 6c69 t:.Wget/1.15.(li
0x0060: 6e75 782d 676e 7529 0d0a 4163 6365 7074 nux-gnu)..Accept
0x0070: 3a20 2a2f 2a0d 0a48 6f73 743a 2031 302e :.*/*..Host:.10.
0x0080: 3633 2e32 3232 2e33 333a 3230 3036 0d0a 63.222.33:2006..
0x0090: 436f 6e6e 6563 7469 6f6e 3a20 4b65 6570 Connection:.Keep
0x00a0: 2d41 6c69 7665 0d0a 0d0a -Alive....
12:45:07.149222 IP 10.63.222.33.2006 > 100.66.3.186.34156: Flags [.], ack 119, win 1810, options [nop,nop,TS val 339576 ecr 57238514], length 0
0x0000: 4500 0034 ebec 4000 3f06 ff7a 0a3f de21 E..4..@.?..z.?.!
0x0010: 6442 03ba 07d6 856c 2330 c594 b958 6af4 dB.....l#0...Xj.
0x0020: 8010 0712 ef21 0000 0101 080a 0005 2e78 .....!.........x
0x0030: 0369 63f2 .ic.
12:45:07.149250 IP 10.63.222.33.2006 > 100.66.3.186.34156: Flags [P.], seq 1:547, ack 119, win 1810, options [nop,nop,TS val 339576 ecr 57238514], length 546
0x0000: 4500 0256 ebed 4000 3f06 fd57 0a3f de21 E..V..@.?..W.?.!
0x0010: 6442 03ba 07d6 856c 2330 c594 b958 6af4 dB.....l#0...Xj.
0x0020: 8018 0712 0cf5 0000 0101 080a 0005 2e78 ...............x
0x0030: 0369 63f2 5461 626c 653a 204c 696e 6b73 .ic.Table:.Links
0x0040: 0a4c 6f63 616c 2049 5009 5265 6d6f 7465 .Local.IP.Remote
0x0050: 2049 5009 4879 7374 2e09 4c51 094e 4c51 .IP.Hyst..LQ.NLQ
...
i will also capture a failed variant (give me some time)
interesting: i get/see the data, but the openwrt-wget ("uclient-fetch") aborts. the error is not on the olsr-side 8-) imho:
root@EG-superbuffi76:~ :) tcpdump -nXi eth1 host 10.63.222.33 and port 2006
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
12:50:20.752654 IP 10.63.6.125.35638 > 10.63.222.33.2006: Flags [S], seq 596364484, win 29200, options [mss 1460,sackOK,TS val 37581585 ecr 0,nop,wscale 4], length 0
0x0000: 4500 003c bfd5 4000 4006 81ca 0a3f 067d E..<..@.@....?.}
0x0010: 0a3f de21 8b36 07d6 238b ccc4 0000 0000 .?.!.6..#.......
0x0020: a002 7210 e42b 0000 0204 05b4 0402 080a ..r..+..........
0x0030: 023d 7311 0000 0000 0103 0304 .=s.........
12:50:20.753548 IP 10.63.222.33.2006 > 10.63.6.125.35638: Flags [S.], seq 4075327443, ack 596364485, win 28960, options [mss 1460,sackOK,TS val 371012 ecr 37581585,nop,wscale 4], length 0
0x0000: 4500 003c 0000 4000 4006 41a0 0a3f de21 E..<..@.@.A..?.!
0x0010: 0a3f 067d 07d6 8b36 f2e8 8fd3 238b ccc5 .?.}...6....#...
0x0020: a012 7120 b904 0000 0204 05b4 0402 080a ..q.............
0x0030: 0005 a944 023d 7311 0103 0304 ...D.=s.....
12:50:20.753785 IP 10.63.6.125.35638 > 10.63.222.33.2006: Flags [.], ack 1, win 1825, options [nop,nop,TS val 37581585 ecr 371012], length 0
0x0000: 4500 0034 bfd6 4000 4006 81d1 0a3f 067d E..4..@.@....?.}
0x0010: 0a3f de21 8b36 07d6 238b ccc5 f2e8 8fd4 .?.!.6..#.......
0x0020: 8010 0721 51cd 0000 0101 080a 023d 7311 ...!Q........=s.
0x0030: 0005 a944 ...D
12:50:20.759783 IP 10.63.6.125.35638 > 10.63.222.33.2006: Flags [P.], seq 1:45, ack 1, win 1825, options [nop,nop,TS val 37581585 ecr 371012], length 44
0x0000: 4500 0060 bfd7 4000 4006 81a4 0a3f 067d E..`..@.@....?.}
0x0010: 0a3f de21 8b36 07d6 238b ccc5 f2e8 8fd4 .?.!.6..#.......
0x0020: 8018 0721 5087 0000 0101 080a 023d 7311 ...!P........=s.
0x0030: 0005 a944 4745 5420 2f6c 696e 2048 5454 ...DGET./lin.HTT
0x0040: 502f 312e 310d 0a48 6f73 743a 2031 302e P/1.1..Host:.10.
0x0050: 3633 2e32 3232 2e33 333a 3230 3036 0d0a 63.222.33:2006..
12:50:20.760516 IP 10.63.222.33.2006 > 10.63.6.125.35638: Flags [.], ack 45, win 1810, options [nop,nop,TS val 371013 ecr 37581585], length 0
0x0000: 4500 0034 2690 4000 4006 1b18 0a3f de21 E..4&.@.@....?.!
0x0010: 0a3f 067d 07d6 8b36 f2e8 8fd4 238b ccf1 .?.}...6....#...
0x0020: 8010 0712 51af 0000 0101 080a 0005 a945 ....Q..........E
0x0030: 023d 7311 .=s.
12:50:20.762514 IP 10.63.6.125.35638 > 10.63.222.33.2006: Flags [P.], seq 45:72, ack 1, win 1825, options [nop,nop,TS val 37581586 ecr 371013], length 27
0x0000: 4500 004f bfd8 4000 4006 81b4 0a3f 067d E..O..@.@....?.}
0x0010: 0a3f de21 8b36 07d6 238b ccf1 f2e8 8fd4 .?.!.6..#.......
0x0020: 8018 0721 511c 0000 0101 080a 023d 7312 ...!Q........=s.
0x0030: 0005 a945 5573 6572 2d41 6765 6e74 3a20 ...EUser-Agent:.
0x0040: 7563 6c69 656e 742d 6665 7463 680d 0a uclient-fetch..
12:50:20.762780 IP 10.63.222.33.2006 > 10.63.6.125.35638: Flags [.], ack 72, win 1810, options [nop,nop,TS val 371013 ecr 37581586], length 0
0x0000: 4500 0034 2691 4000 4006 1b17 0a3f de21 E..4&.@.@....?.!
0x0010: 0a3f 067d 07d6 8b36 f2e8 8fd4 238b cd0c .?.}...6....#...
0x0020: 8010 0712 5193 0000 0101 080a 0005 a945 ....Q..........E
0x0030: 023d 7312 .=s.
12:50:20.762991 IP 10.63.6.125.35638 > 10.63.222.33.2006: Flags [P.], seq 72:74, ack 1, win 1825, options [nop,nop,TS val 37581586 ecr 371013], length 2
0x0000: 4500 0036 bfd9 4000 4006 81cc 0a3f 067d E..6..@.@....?.}
0x0010: 0a3f de21 8b36 07d6 238b cd0c f2e8 8fd4 .?.!.6..#.......
0x0020: 8018 0721 4470 0000 0101 080a 023d 7312 ...!Dp.......=s.
0x0030: 0005 a945 0d0a ...E..
12:50:20.763162 IP 10.63.222.33.2006 > 10.63.6.125.35638: Flags [.], ack 74, win 1810, options [nop,nop,TS val 371013 ecr 37581586], length 0
0x0000: 4500 0034 2692 4000 4006 1b16 0a3f de21 E..4&.@.@....?.!
0x0010: 0a3f 067d 07d6 8b36 f2e8 8fd4 238b cd0e .?.}...6....#...
0x0020: 8010 0712 5191 0000 0101 080a 0005 a945 ....Q..........E
0x0030: 023d 7312 .=s.
12:50:20.798902 IP 10.63.222.33.2006 > 10.63.6.125.35638: Flags [P.], seq 1:547, ack 74, win 1810, options [nop,nop,TS val 371017 ecr 37581586], length 546
0x0000: 4500 0256 2693 4000 4006 18f3 0a3f de21 E..V&.@.@....?.!
0x0010: 0a3f 067d 07d6 8b36 f2e8 8fd4 238b cd0e .?.}...6....#...
0x0020: 8018 0712 5d59 0000 0101 080a 0005 a949 ....]Y.........I
0x0030: 023d 7312 5461 626c 653a 204c 696e 6b73 .=s.Table:.Links
0x0040: 0a4c 6f63 616c 2049 5009 5265 6d6f 7465 .Local.IP.Remote
0x0050: 2049 5009 4879 7374 2e09 4c51 094e 4c51 .IP.Hyst..LQ.NLQ
...
here the wget-command:
root@EG-superbuffi76:~ :) wget -O - http://10.63.222.33:2006/lin
Downloading 'http://10.63.222.33:2006/lin'
Connecting to 10.63.222.33:2006
(null) 0 - stalled -
Connection reset prematurely
olsrd only supports http 1.0 and 1.1 requests
can you show me the dump of uclient-fetch?
i just wondered, because GNU wget emits the warning:
bastian@X301-II ~ $ LC_ALL=C wget -O /dev/null http://10.63.222.33:2006/lin
--2017-02-07 12:57:39-- http://10.63.222.33:2006/lin
Connecting to 10.63.222.33:2006... connected.
HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9
the uclient-fetch dump is 2 comments above under "give me some time".
try wget ..../http/lin
wow, this works....so...uh! this change I was not aware of...
root@EG-superbuffi76:~ :) uclient-fetch -qO - http://10.63.222.33:2006/http/lin
Table: Links
Local IP Remote IP Hyst. LQ NLQ Cost
10.63.222.33 10.63.160.161 0.000 1.000 1.000 0.100
10.63.222.33 10.63.6.125 0.000 1.000 1.000 0.100
10.63.222.33 10.63.42.125 0.000 1.000 1.000 0.100
10.63.222.3 10.63.80.195 0.000 1.000 1.000 1.000
10.63.222.3 10.63.197.131 0.000 0.862 1.000 1.158
10.63.222.1 10.63.2.1 0.000 0.972 0.117 8.739
10.63.222.3 10.63.233.129 0.000 0.976 1.000 1.023
10.63.222.3 10.63.6.67 0.000 1.000 1.000 1.000
10.63.222.3 10.63.156.131 0.000 1.000 1.000 1.000
10.63.222.3 10.63.135.193 0.000 0.972 0.000 INFINITE
0.9.6 -------------------------------------------------------------------
* The versions of the following plugins have changed:
- jsoninfo : 0.0 --> 1.1
- nameservice : 0.3 --> 0.4
- netjson : 1.0 --> 1.1
- pud : 2.0.0 --> 3.0.0 (including its extra libraries)
- txtinfo : 0.1 --> 1.1
* All info plugins (jsoninfo, netjson and txtinfo) now support a number of
request prefixes:
- /http : forces output WITH http headers, temporarily overriding the
configured "httpheaders" value.
- /plain: forces output WITHOUT http headers, temporarily overriding the
configured "httpheaders" value.
These prefixes have to be at the start of the request string, can occur
only there, and can occur only once.
@bittorf I've pushed a commit to automatically detect whether http headers are needed, please try it
info: automatically detect whether the reply should have HTTP headers
This is the case when a HTTP request is done.
The request can still override whether or not HTTP headers are sent
by employing the 'http' and 'plain' request prefixes.
@SvenRoederer Could you please try again with the most recent commit on the release (or master) branch?
Was quite busy for me these days ...
I just build a updated firmware (https://buildbot.berlin.freifunk.net/buildbot/unstable/ar71xx-generic/98/VERSION.txt) with OLSRd v0.9.6.1 (release-tag; via https://github.com/SvenRoederer/openwrt-routing/commit/dde4487dac14b902af4deaf7a8d96006a69bb520) and can still see the problem.
this script monitors the routes:
#!/bin/sh
echo -e "all routes\tolsr-table"
while true; do
echo -e "$(ip route sh table all |wc -l | tr -d '\n')\t\t$(ip route show table olsr |wc -l)"
sleep 10
done
outputs:
root@SAm0815-test-glar150:~# /root/bin/check_olsr-routes.sh
all routes olsr-table
1168 1094
77 3
1168 1094
1168 1094
1167 1093
77 3
77 3
1009 3
750 3
77 3
1167 1093
1173 1099
77 3
1173 1099
77 3
1166 1092
77 3
77 3
77 3
1167 1093
1167 1093
1167 1093
1167 1093
1164 1090
1164 1090
1166 778
77 3
77 3
77 3
1175 1101
77 3
77 3
1169 1095
1169 1095
my olsrd.config is:
root@SAm0815-test-glar150:~# cat /var/etc/olsrd.conf
DebugLevel 0
AllowNoInt yes
IpVersion 4
FIBMetric "flat"
TcRedundancy 2
NatThreshold 0.75
LinkQualityAlgorithm "etx_ff"
SmartGateway yes
SmartGatewayThreshold 50
Pollrate 0.025
RtTable 111
RtTableDefault 112
RtTableTunnel 113
RtTableTunnelPriority 100000
RtTableDefaultOlsrPriority 20000
SmartGatewaySpeed 1000 3000
SmartGatewayUplink "both"
Hna4
{
10.230.197.208 255.255.255.240
}
LoadPlugin "olsrd_arprefresh.so.0.1"
{
}
LoadPlugin "olsrd_watchdog.so.0.1"
{
PlParam "file" "/var/run/olsrd.watchdog"
PlParam "interval" "30"
}
LoadPlugin "olsrd_dyn_gw.so.0.5"
{
PlParam "Ping" "85.214.20.141"
PlParam "Ping" "213.73.91.35"
PlParam "Ping" "194.150.168.168"
PlParam "PingCmd" "ping -c 1 -q -I ffvpn %s"
PlParam "PingInterval" "30"
}
InterfaceDefaults
{
MidValidityTime 500.0
TcInterval 2.0
HnaValidityTime 125.0
HelloValidityTime 125.0
TcValidityTime 500.0
Ip4Broadcast 255.255.255.255
MidInterval 25.0
HelloInterval 3.0
HnaInterval 10.0
}
Interface "wlan0-adhoc-2"
{
}
Any infos I can provide in addition?
yes, please try bisecting it. go back to a release/commit that works ok for you and do a git bisect
i cannot see these problems in my testnet with 70 nodes and mixed OLSR-versions. @SvenRoederer is it maybe because you have a restarting daemon (false positives in your watchdog?)
we don't see it either in our test network, seems very stable
@bittorf no external watchdog is running (PID still the same); removing all plugins don't changes anything
@pmelange also reported no problems on his installation. (https://github.com/freifunk-berlin/firmware/issues/418#issuecomment-277550028). I feel it might be related to the BBB-VPN and the lowered link-qualitiy
@fhuberts I started the bisect today and directly hit the issue I had seen in 0.9.5 already (https://github.com/freifunk-berlin/firmware/issues/424) So it crashes even before any routes getting installed.
yes, so skip that commit
this one seems to be a completely different problem than this "instable routes" here. The 0.9.6-code is not crashing with the ASSERT. I'd like to make sure that we don't mix up 2 different problems
that issue was fixed
ah no it wasn't. this is the first time I hear of this issue, why wasn't it reported?
2e7f2942bd47fc7f0b4ca0bf4581c3dee3c1f85a probably did fix it
regarding the assert-thing: I had seen it on 0.9.5, but as I was not seen in 0.9.6, I assumed it was fixed by intention. The move to 0.9.6 was done as of the "filechange-interval", and I forgot about 0.9.5 ... I bisect to a commit where 2e7f294 was still included and the ASSERT still failed
this assert thing can be tracked in https://github.com/freifunk-berlin/firmware/issues/424
bisecting results in: (which seems really unrelated)
8cef7bf8a03420eebab5b23db4ec4d2a203aeec3 is the first bad commit
commit 8cef7bf8a03420eebab5b23db4ec4d2a203aeec3
Author: Ferry Huberts <ferry.huberts@pelagic.nl>
Date: Mon Nov 9 15:49:11 2015 +0100
lock_file: add olsr_remove_lock_file function
And use it in the error paths during creation
Signed-off-by: Ferry Huberts <ferry.huberts@pelagic.nl>
:040000 040000 501ff49e204e000119558163d19072709ea5c706 848ebe27591e72ec022efa1d5d3f8dbe09e3c8ce M src
bisect was running from v0.9.0.3 to 2e568eb7264dd9df3fcf68db83 (next commit would introduce the ASSERT-issue again)
git bisect start
# good: [c6fbdafd11ef1d31cbbeab138317c3fdd6673d1a] Release v0.9.0.3
git bisect good c6fbdafd11ef1d31cbbeab138317c3fdd6673d1a
# bad: [2e568eb7264dd9df3fcf68db835b066adee6546f] main: minor update
git bisect bad 2e568eb7264dd9df3fcf68db835b066adee6546f
# good: [e21085a327cbb682ff91f8236800a79d9e9eb301] mdns: update a comment about exit
git bisect good e21085a327cbb682ff91f8236800a79d9e9eb301
# bad: [736d46ec3109f97864c2d35ca35438e3bbcae9ff] main: move loading the config into the loadConfig function
git bisect bad 736d46ec3109f97864c2d35ca35438e3bbcae9ff
# good: [83d40b74acf3fa46e1780cd569f0fc3c412f8d45] quagga: clean up olsr_exit messages
git bisect good 83d40b74acf3fa46e1780cd569f0fc3c412f8d45
# good: [79ce902e56b2ceaf4ba6749187b1cec78fc94bc3] main: always store argv
git bisect good 79ce902e56b2ceaf4ba6749187b1cec78fc94bc3
# good: [5d6a4ce069945b670067ac8a875a16793c2f43fd] lock_file: move olsrd_get_default_lockfile into its own file
git bisect good 5d6a4ce069945b670067ac8a875a16793c2f43fd
# bad: [8cef7bf8a03420eebab5b23db4ec4d2a203aeec3] lock_file: add olsr_remove_lock_file function
git bisect bad 8cef7bf8a03420eebab5b23db4ec4d2a203aeec3
# good: [6fa811140b4dae2af6c8d04e7666dd5a8f714f35] main: move olsr_create_lock_file into its own file
git bisect good 6fa811140b4dae2af6c8d04e7666dd5a8f714f35
# first bad commit: [8cef7bf8a03420eebab5b23db4ec4d2a203aeec3] lock_file: add olsr_remove_lock_file function
That commit can not possibly result in unstable routing. Please run the bisect properly between 0.9.0.3 and 0.9.6.1. If you run into the assert that is blocking you then just cherry-pick 97d4916, that should fix that problem for you.
yeah, I was also wondering about 8cef7bf8a03420eebab5b23db4ec4d2a203aeec3 "is the first bad commit". Then did the following to double-check:
As these commits seem unrelated, I did the other way around:
That is very confusing. Just tell me which tree you bisected and what the results are. Now I have to search between trees and I bet I'm not doing it right because it's too confusing.
at the end it happens on master between 13aa7f3 and 5af5485
just check: https://github.com/SvenRoederer/olsrd/commits/find_route-problem_simplified (last 3 commits)
That doesn't make sense. I don't see how adding some static data - that is totally unused in the routing related code - could have that effect. Are you sure that your bisect is correct?
I agree with you, I also was wondering very much.
That this commit was supposed to cause this, was the reason for trying different ways to isolate this commit. But every time I came to the result that after this commit the routes came and go...
Btw. I used the Makefile from the openwret-routing-feed and adjusted the PKG_SOURCE_VERSION
to my needs. Also disabling the addons completly be commenting out SUBDIRS
did not change anything.
Maybe you like to look at your own in a setup like mine, by connecting to the BBB-VPN (http://bbb-vpn.berlin.freifunk.net/cgi-bin-index.html)
Using OLSR 0.9.6-git_1b51a49-hash_60d038da5dd8ad0e53f8f55729562986 is see no problems with the routing tables as Sven has described them. My Freifunk test-node ist connected directly to the berlin network and not over the BBB-VPN. As far as I can tell, that is the only difference.
Some time in the next couple days, I will update the snapshot and reflash the router to see if everything is still working fine. Until then...
Ok thanks for the report! This gives me a bit more information...
@SvenRoederer So the only difference is the BBB-VPN. That node seems to be running olsrd as well, and I bet it's not updated to 0.9.6.1 yet... I'm also betting that when you upgrade that node to 0.9.6.1 that your unstable routing problem is gone.
Just some more informaion:
The first hop for my test node is over ad-hoc wifi. The first hop node is running 0.9.0.3-git_1b6dc2e-hash_217925b912d7d2155bea6239a46ae95c.
@fhuberts I don't know what version is running on the bbb-vpn server, but I'm meshing fine with an older version.
0.9.0.3 has the 'fragmented hellos' problem.
how many neighbors are there
Here are the neighbors. Also note, on the test node only olsr4 is running
Test node: OLSR 0.9.6-git_1b51a49-hash_60d038da5dd8ad0e53f8f55729562986
root@perry-test:~# neigh.sh
Local Remote vTime LQ NLQ Cost Host
10.31.23.145 10.230.226.194 141908 1.000000 1.000000 1.000000 mid7.scherer8.olsr
10.31.23.144 10.230.226.193 137889 0.983000 0.886000 1.145508 scherer8.olsr
nc: can't connect to remote host: Connection refused
Failed to parse message data
First hop: OLSR 0.9.0.3-git_1b6dc2e-hash_217925b912d7d2155bea6239a46ae95c
root@scherer8:~# neigh.sh
Local Remote vTime LQ NLQ Cost
10.230.226.202 10.31.6.53 140951 1.000000 1.000000 1024
10.230.226.211 10.230.226.212 141396 1.000000 1.000000 1024
10.230.226.211 10.230.226.213 141477 1.000000 1.000000 1024
10.230.226.203 10.31.31.77 138762 0.191000 1.000000 5328
10.230.226.193 10.31.23.144 136613 0.831000 0.991000 1241
10.230.226.194 10.31.23.145 141304 1.000000 1.000000 1024
Local Remote vTime LQ NLQ Cost
2001:bf7:750:2e0b::1 2001:bf7:750:2e1b::1 135906 1.000000 1.000000 102
2001:bf7:750:2e02::1 2001:bf7:836:a::1 139080 1.000000 1.000000 1024
2001:bf7:750:2e0b::1 2001:bf7:750:2e2b::1 141641 1.000000 1.000000 102
2001:bf7:750:2e03::1 2001:bf7:800:103::1 137759 1.000000 1.000000 1024
Second Hop 1: OLSR 0.9.0.3-git_1b6dc2e-hash_7aaa60310210a745b5b00863c99fae6b
root@scherer8-abb:~# neigh.sh
Local Remote vTime LQ NLQ Cost
10.230.226.212 10.230.226.211 142362 1.000000 1.000000 1024
10.230.226.212 10.230.226.213 136814 1.000000 1.000000 1024
Local Remote vTime LQ NLQ Cost
2001:bf7:750:2e1b::1 2001:bf7:750:2e0b::1 137255 1.000000 1.000000 102
2001:bf7:750:2e1b::1 2001:bf7:750:2e2b::1 138847 1.000000 1.000000 102
Second Hop 2: OLSR 0.9.0.3-git_1b6dc2e-hash_217925b912d7d2155bea6239a46ae95c
root@basta:~# neigh.sh
Local Remote vTime LQ NLQ Cost
10.230.226.213 10.230.226.211 136894 1.000000 1.000000 1024
10.230.226.213 10.230.226.212 141903 1.000000 1.000000 1024
Local Remote vTime LQ NLQ Cost
2001:bf7:750:2e2b::1 2001:bf7:750:2e1b::1 139174 1.000000 1.000000 102
2001:bf7:750:2e2b::1 2001:bf7:750:2e0b::1 136695 1.000000 1.000000 102
Second Hop 3: OLSR 0.6.7.1-git_cebcd32-hash_e30a1ec38cc6d414bb747f8018021d59
root@tub-core:~# neigh.sh
Local Remote vTime LQ NLQ Cost
10.31.31.1 10.31.28.161 138053 1.000000 1.000000 1024
10.31.31.77 10.230.226.203 137186 0.979000 0.195000 5326
10.31.31.77 10.31.48.6 138409 1.000000 1.000000 1024
10.230.145.173 10.230.18.181 137343 1.000000 1.000000 1024
10.230.145.173 10.31.13.49 143084 1.000000 0.948000 1079
10.31.31.75 10.230.44.100 136743 1.000000 1.000000 1024
10.230.145.173 10.230.242.133 139771 1.000000 0.972000 1052
10.230.145.173 10.31.26.61 140113 1.000000 0.956000 1070
10.31.31.73 10.31.3.1 142879 1.000000 1.000000 1024
10.31.31.81 10.31.1.40 131301 0.435000 0.757000 3108
Local Remote vTime LQ NLQ Cost
2001:bf7:800:103::1 2001:bf7:750:2a80::1 137414 1.000000 1.000000 1024
2001:bf7:800:103::1 2001:bf7:750:2e03::1 136139 1.000000 1.000000 1024
2001:bf7:800:102::1 2001:bf7:800:2::1 142713 1.000000 1.000000 1024
Second Hop 4: OLSR 0.6.7.1-git_cebcd32-hash_2e2a1899170f7d0456572b7c247d1f07
root@segen-core:~# neigh.sh
Local Remote vTime LQ NLQ Cost
10.31.6.33 10.31.12.171 136828 0.897000 1.000000 1140
10.31.6.1 10.31.6.85 140879 1.000000 1.000000 1024
10.31.6.1 10.31.6.65 137912 1.000000 1.000000 1024
10.31.6.37 10.31.2.45 138326 1.000000 1.000000 1024
10.31.6.45 10.31.27.41 136798 1.000000 0.897000 1140
10.31.6.53 10.230.226.202 142380 1.000000 1.000000 1024
10.31.6.1 10.31.6.69 139094 1.000000 1.000000 1024
10.31.6.1 10.31.6.73 138395 1.000000 1.000000 1024
10.31.6.33 10.31.55.45 134650 1.000000 1.000000 1024
10.31.6.33 10.31.33.17 142352 1.000000 0.909000 1125
10.31.6.1 10.31.6.93 137276 1.000000 1.000000 1024
10.31.6.1 10.31.6.89 139385 1.000000 1.000000 1024
10.31.6.37 10.31.4.73 139458 1.000000 1.000000 1024
10.31.6.1 10.31.6.81 140031 1.000000 1.000000 1024
10.31.6.33 10.36.40.73 138082 0.940000 1.000000 1088
10.31.6.41 10.31.11.93 136991 1.000000 1.000000 1024
10.31.6.49 10.230.23.143 38714 1.000000 1.000000 1024
10.31.6.33 10.31.13.13 139812 0.956000 0.772000 1385
10.31.6.1 10.31.6.77 134949 1.000000 1.000000 1024
Local Remote vTime LQ NLQ Cost
2001:bf7:836::1 2001:bf7:836:71::1 137788 1.000000 1.000000 1024
2001:bf7:836::1 2001:bf7:836:51::1 139124 1.000000 1.000000 1024
2001:bf7:836::1 2001:bf7:836:10::1 138285 1.000000 1.000000 1024
2001:bf7:836::1 2001:bf7:836:31::1 138409 1.000000 1.000000 1024
2001:bf7:836::1 2001:bf7:836:20::1 138126 1.000000 1.000000 1024
2001:bf7:836:7::1 fd9c:1d37:4f28:8::1 138129 1.000000 1.000000 1024
2001:bf7:836:7::1 2001:bf7:830:9::1 138435 1.000000 1.000000 1024
2001:bf7:836:5::1 2001:bf7:760:805::1 136476 1.000000 1.000000 1024
2001:bf7:836:5::1 fd8b:6aff:97af:2::1 136701 0.940000 1.000000 1088
2001:bf7:836::1 2001:bf7:836:80::1 138255 1.000000 1.000000 1024
2001:bf7:836:a::1 2001:bf7:750:2e02::1 140386 1.000000 1.000000 1024
2001:bf7:836::1 2001:bf7:836:61::1 140734 1.000000 1.000000 1024
2001:bf7:836:6::1 2001:bf7:831:3::1 137070 1.000000 1.000000 1024
2001:bf7:836:9::1 2001:bf7:750:1205::1 37835 1.000000 1.000000 1024
2001:bf7:836:5::1 2001:bf7:760:8411::1 138209 0.979000 0.772000 1351
2001:bf7:836::1 2001:bf7:836:41::1 137077 1.000000 1.000000 1024
ok, not enough nodes to cause fragmentation.
@SvenRoederer How's this for you?
Ok, not enough nodes to suffer from the fragmented hellos problem.
The BBB-VPN node seems to run something like olsr 0.6.x ( @booo gave me this info after a short look) but @sven-ola might know best
@fhuberts is this "fragmented hellos" problem exclusively for 0.9.0.3 or any version up to 0.9.0.3?
around d8ffc6f I was running this code on some Freifunk-nodes and experienced that the routes are not stable. Some routes went of the routing table and came back every few seconds.
A test on my local-node with looping
ip route sh table olsr|wc -l
gave changing number of routes, even the network was stable / no changes in the local mesh.I used the exact same config as for olsrd 0.9.0.3, where all was running normally before and after checking with 0.9.6.