freifunk-berlin / firmware

DEPRECATED: Build system for Berlin firmware. Please user the pinned falter-repos instead
https://berlin.freifunk.net
GNU General Public License v3.0
74 stars 34 forks source link

ubnt EdgeRouterX switch dies or sthg (affects ramips-mt7621) #494

Open bobster-galore opened 6 years ago

bobster-galore commented 6 years ago

ubnt erx and +sfp have been seen in the wild, when suddenly the switch is dying which shows in loosing connections and / or interfaces. We could investigate in that subject to find out what is causing it and try / help to solve the problem, since there will be soon a significant number of routers online (ff-Meko-project). Some work is already going on in lede, what we could support. I attach in ascending date: http://lists.infradead.org/pipermail/lede-dev/2017-July/008268.html | mt7621 wdt reset- console not accepting commands http://lists.infradead.org/pipermail/lede-dev/2017-August/008594.html | Transmit timeouts with mtk_eth_soc and MT7621 http://lists.infradead.org/pipermail/lede-dev/2017-August/008738.html | ramips: Improve stability of the mt7621 switch https://patchwork.ozlabs.org/patch/808121/ | ramips: Improve stability of the mt7621 switch Can somebody shed light on this?

SvenRoederer commented 6 years ago

there is also a discussion that the previous listed ideas might not lead to a solution: http://lists.infradead.org/pipermail/lede-dev/2017-November/009799.html

bobster-galore commented 6 years ago

Has there been a check if original firmware behaves different? May be it's an hardware issue?

SvenRoederer commented 6 years ago

another one in the OpenWrt-Mailinglist: http://lists.infradead.org/pipermail/lede-dev/2018-April/011939.html

SvenRoederer commented 6 years ago

some recent OpenWrt-commits:

In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR().

Fixes: f079b6406348 ("staging: mt7621-eth: add gigabit switch driver (GSW)")

booo commented 6 years ago

Is there a build that incorporates the patches?

booo commented 6 years ago

I installed OpenWrt SNAPSHOT, r7050-9c409cb on a erx-sfp that we had to restart a few times in the past. The snapshot should include the fix.

So far I see strange load patterns (constant load of 1):

http://monitor.berlin.freifunk.net/detail.php?p=load&t=load&h=flughafen-core&s=86400

And we had one exception in the kernel code so far:

[ 2776.744924] ------------[ cut here ]------------
[ 2776.754179] WARNING: CPU: 3 PID: 0 at ./include/net/dst.h:256 0x8e8cd4d8
[ 2776.767546] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio gpio_button_hotplug
[ 2776.891351] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.43 #0
[ 2776.903325] Stack : 00000000 00000000 00000000 00000000 805f7ad2 00000034 00000000 00000000
[ 2776.919962]         8fc44e74 80590947 8051cbfc 00000003 00000000 00000001 8fc15c38 532616ca
[ 2776.936602]         00000000 00000000 805f0000 00003a98 00000000 000000ca 00000007 00000000
[ 2776.953247]         00000000 80590000 000d99d7 00000000 00000000 00000000 805b0000 8e8cd4d8
[ 2776.969893]         00000009 00000100 00000001 00000003 00000003 80291630 0000000c 805f000c
[ 2776.986531]         ...
[ 2776.991396] Call Trace:
[ 2776.996283] [<80010498>] show_stack+0x58/0x100
[ 2777.005146] [<8045f4ac>] dump_stack+0x9c/0xe0
[ 2777.013820] [<8002e208>] __warn+0xe0/0x114
[ 2777.021969] [<8002e2cc>] warn_slowpath_null+0x1c/0x30
[ 2777.032036] [<8e8cd4d8>] 0x8e8cd4d8
[ 2777.039100] ---[ end trace 833f5b5e0b6c2d47 ]---
[ 2778.073010] dst_release: dst:8e03fa80 refcnt:-1

I will report back if we have another crash with the new code.

booo commented 6 years ago

Still up and running but we get even more interesting output:

[48546.622689] ------------[ cut here ]------------
[48546.631924] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
[48546.648426] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[48546.662325] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio gpio_button_hotplug
[48546.786047] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.14.43 #0
[48546.800418] Stack : 00000000 00000000 00000000 00000000 805f7ad2 00000042 00000000 00000000
[48546.817062]         80590db4 80590947 8051cbfc 00000000 00000000 00000001 8fc09d68 532616ca
[48546.833693]         00000000 00000000 805f0000 00004240 00000000 000000dd 00000007 00000000
[48546.850321]         00000000 80590000 000bfe7f 00000000 00000000 00000000 805b0000 8036da68
[48546.866948]         00000009 00000140 00000000 8ff9df40 00000001 80291630 00000000 805f0000
[48546.883577]         ...
[48546.888432] Call Trace:
[48546.893316] [<80010498>] show_stack+0x58/0x100
[48546.902170] [<8045f4ac>] dump_stack+0x9c/0xe0
[48546.910830] [<8002e208>] __warn+0xe0/0x114
[48546.918969] [<8002e26c>] warn_slowpath_fmt+0x30/0x3c
[48546.928854] [<8036da68>] dev_watchdog+0x1ac/0x324
[48546.938219] [<800861a4>] call_timer_fn.isra.3+0x24/0x84
[48546.948603] [<800863bc>] run_timer_softirq+0x1b8/0x244
[48546.958844] [<8047c750>] __do_softirq+0x128/0x2ec
[48546.968202] [<80032910>] irq_exit+0x98/0xcc
[48546.976533] [<8024a6cc>] plat_irq_dispatch+0xfc/0x138
[48546.986582] [<8000b5a8>] except_vec_vi_end+0xb8/0xc4
[48546.996450] [<8000cf70>] r4k_wait_irqoff+0x1c/0x24
[48547.005993] [<8006645c>] do_idle+0xe4/0x168
[48547.014312] [<800666d8>] cpu_startup_entry+0x24/0x2c
[48547.024191] [<805b9bf8>] start_kernel+0x48c/0x4ac
[48547.033742] ---[ end trace 833f5b5e0b6c2d48 ]---
[48547.042978] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[48547.055320] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[48547.067337] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f130000, max=0, ctx=2839, dtx=2839, fdx=2838, next=2839
[48547.089031] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0eae0000, max=0, calc=3617, drx=3728
[48547.111380] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[48547.131715] mtk_soc_eth 1e100000.ethernet: PPE started
[49547.049404] ------------[ cut here ]------------
[49547.058664] WARNING: CPU: 1 PID: 0 at ./include/net/dst.h:256 0x8e8cd4d8
[49547.072041] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio gpio_button_hotplug
[49547.195767] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.14.43 #0
[49547.210145] Stack : 00000000 00000000 00000000 00000000 805f7ad2 00000042 00000000 00000000
[49547.226787]         8fc441f4 80590947 8051cbfc 00000001 00000000 00000001 8fc0dc38 532616ca
[49547.243440]         00000000 00000000 805f0000 00004ee0 00000000 000000fe 00000007 00000000
[49547.260084]         00000000 80590000 0002fcb7 00000000 00000000 00000000 805b0000 8e8cd4d8
[49547.276727]         00000009 00000100 00000001 00000003 00000001 80291630 00000004 805f0004
[49547.293367]         ...
[49547.298232] Call Trace:
[49547.303121] [<80010498>] show_stack+0x58/0x100
[49547.312001] [<8045f4ac>] dump_stack+0x9c/0xe0
[49547.320680] [<8002e208>] __warn+0xe0/0x114
[49547.328838] [<8002e2cc>] warn_slowpath_null+0x1c/0x30
[49547.338917] [<8e8cd4d8>] 0x8e8cd4d8
[49547.345938] ---[ end trace 833f5b5e0b6c2d49 ]---
[49547.355253] dst_release: dst:8edd9780 refcnt:-1
[50292.637021] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[50292.649359] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[50292.661372] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eae0000, max=0, ctx=896, dtx=896, fdx=895, next=896
[50292.682366] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0dc60000, max=0, calc=856, drx=862
[50292.703700] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5860000c, 0x10c = 0x80818
[50292.724203] mtk_soc_eth 1e100000.ethernet: PPE started
[51252.638068] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[51252.650410] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[51252.662423] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ec80000, max=0, ctx=3580, dtx=3580, fdx=3579, next=3580
[51252.684138] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e3b0000, max=0, calc=585, drx=601
[51252.705470] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
[51252.725940] mtk_soc_eth 1e100000.ethernet: PPE started
[51997.599643] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[51997.612013] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[51997.624016] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dcd0000, max=0, ctx=3060, dtx=3060, fdx=3059, next=3060
[51997.645715] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0de30000, max=0, calc=3653, drx=3664
[51997.667336] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[51997.687505] mtk_soc_eth 1e100000.ethernet: PPE started
booo commented 6 years ago

Problem persist even with the new openwrt version mentioned above.

bobster-galore commented 6 years ago

What a pity! It's only visible under load? What could be a help? There is an idle erx in spandau, we could treat it?!

SvenRoederer commented 5 years ago

just found this in the OpenWrt-devel list: http://lists.infradead.org/pipermail/openwrt-devel/2018-October/014272.html

Probably someone can test?

SvenRoederer commented 5 years ago

This might also fix this problem: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=fe7d965ea95e78905328fe5425c8e90e3bf11e58

pmelange commented 5 years ago

Ever since I upgraded the firmware on the erx-sfp (coloniaallee) from some 1.0.0-alpha version to 1.0.2 the router has been online. Not once has the problem described above happened again.

SvenRoederer commented 5 years ago

As @booo mentioned, he was able to see this several times with a more recent kernel than Hedy-1.0.2 is using. So I'm quite sure, this bug is still waiting to get triggered...

SvenRoederer commented 4 years ago

there is a patch around being discussed: http://lists.infradead.org/pipermail/openwrt-devel/2019-March/016146.html, replied to on Oct 2019: http://lists.infradead.org/pipermail/openwrt-devel/2019-October/019627.html

SvenRoederer commented 4 years ago

There is a nice report of finding "ethernet pause frames" as cause of the problem: http://lists.infradead.org/pipermail/openwrt-devel/2020-February/021742.html

SvenRoederer commented 4 years ago

https://github.com/openwrt/openwrt/commit/c8f8e59816eca49d776562d2d302bf990a87faf0 sounds like a fix for this issue. Anyone can test?

pmelange commented 4 years ago

According to https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/59 it didn't make a difference.

But I am currently building with this patch. I don't have high hopes though.


after 7577 seconds uptime

[ 7757.227823] ------------[ cut here ]------------
[ 7757.232533] WARNING: CPU: 1 PID: 0 at include/net/dst.h:256 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.242372] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables compat gpio_beeper input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ipip tunnel4 ip_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[ 7757.309857] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.171 #0
[ 7757.315947] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[ 7757.324313]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0bb20 ac07f5b2
[ 7757.332676]         8fc0bbb8 00000000 00000000 00000000 00000038 804997f8 00000008 00000000
[ 7757.341044]         00000000 00000000 0004ba61 ffffffff 00000000 8fc0bb00 00000000 8f14455c
[ 7757.349429]         8f1447fc 00000100 00000001 00000003 00000000 802c096c 00000004 80690004
[ 7757.357803]         ...
[ 7757.360277] Call Trace:
[ 7757.360357] [<804997f8>] 0x804997f8
[ 7757.366327] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.373136] [<802c096c>] 0x802c096c
[ 7757.376672] [<8000bf28>] 0x8000bf28
[ 7757.380167] [<8000bf30>] 0x8000bf30
[ 7757.383669] [<80560000>] 0x80560000
[ 7757.387174] [<80482754>] 0x80482754
[ 7757.390687] [<800773c4>] 0x800773c4
[ 7757.394189] [<8002ed30>] 0x8002ed30
[ 7757.397704] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.404546] [<8002e9d4>] 0x8002e9d4
[ 7757.408094] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.414931] [<803ad4e4>] 0x803ad4e4
[ 7757.418461] [<803b7310>] 0x803b7310
[ 7757.421994] [<803b6f10>] 0x803b6f10
[ 7757.425517] [<803b6020>] 0x803b6020
[ 7757.429033] [<80461630>] 0x80461630
[ 7757.432576] [<803b5670>] 0x803b5670
[ 7757.436098] [<8036f60c>] 0x8036f60c
[ 7757.439620] [<80371d7c>] 0x80371d7c
[ 7757.443147] [<803ba4cc>] 0x803ba4cc
[ 7757.446668] [<8046b2a8>] 0x8046b2a8
[ 7757.450198] [<8046b5f0>] 0x8046b5f0
[ 7757.453724] [<8046b318>] 0x8046b318
[ 7757.457224] [<8036f308>] 0x8036f308
[ 7757.460751] [<8036f91c>] 0x8036f91c
[ 7757.464293] [<8037220c>] 0x8037220c
[ 7757.467801] [<8007d0c4>] 0x8007d0c4
[ 7757.471339] [<8049f950>] 0x8049f950
[ 7757.474835] [<800336c8>] 0x800336c8
[ 7757.478330] [<80275f24>] 0x80275f24
[ 7757.481868] [<80007388>] 0x80007388
[ 7757.485371] 
[ 7757.487000] ---[ end trace 8822d76274df4638 ]---
[ 7757.492211] dst_release: dst:8e0d3700 refcnt:-1
pmelange commented 4 years ago

The system is still running, But at 32978 seconds, I have another kernel error

[32978.876756] ------------[ cut here ]------------
[32978.881419] WARNING: CPU: 2 PID: 0 at include/net/dst.h:256 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32978.891227] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables compat gpio_beeper input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ipip tunnel4 ip_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[32978.958675] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W       4.14.171 #0
[32978.965973] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[32978.974335]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0db20 ac07f5b2
[32978.982671]         8fc0dbb8 00000000 00000000 00000000 00000038 804997f8 00000008 00000000
[32978.991010]         00000000 00000000 000ea0d3 ffffffff 00000000 8fc0db00 00000000 8f14455c
[32978.999345]         8f1447fc 00000100 00000001 00000003 00000002 802c096c 00000008 80690008
[32979.007680]         ...
[32979.010117] Call Trace:
[32979.010170] [<804997f8>] 0x804997f8
[32979.016071] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32979.022828] [<802c096c>] 0x802c096c
[32979.026315] [<8000bf28>] 0x8000bf28
[32979.029787] [<8000bf30>] 0x8000bf30
[32979.033255] [<80560000>] 0x80560000
[32979.036726] [<80482754>] 0x80482754
[32979.040198] [<800773c4>] 0x800773c4
[32979.043667] [<8002ed30>] 0x8002ed30
[32979.047149] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32979.053907] [<8002e9d4>] 0x8002e9d4
[32979.057391] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32979.064154] [<803ad4e4>] 0x803ad4e4
[32979.067638] [<803b7310>] 0x803b7310
[32979.071114] [<803b6f10>] 0x803b6f10
[32979.074589] [<803b6020>] 0x803b6020
[32979.078060] [<80461630>] 0x80461630
[32979.081535] [<803b5670>] 0x803b5670
[32979.085009] [<8036f60c>] 0x8036f60c
[32979.088487] [<803b9128>] 0x803b9128
[32979.091970] [<803bb14c>] 0x803bb14c
[32979.095444] [<80371d7c>] 0x80371d7c
[32979.098916] [<803ba4cc>] 0x803ba4cc
[32979.102393] [<8046b2a8>] 0x8046b2a8
[32979.105865] [<803b6f10>] 0x803b6f10
[32979.109346] [<8046b5f0>] 0x8046b5f0
[32979.112820] [<8046b318>] 0x8046b318
[32979.116292] [<8036f308>] 0x8036f308
[32979.119772] [<8036f91c>] 0x8036f91c
[32979.123244] [<8037220c>] 0x8037220c
[32979.126718] [<8007d0c4>] 0x8007d0c4
[32979.130201] [<8049f950>] 0x8049f950
[32979.133671] [<800336c8>] 0x800336c8
[32979.137143] [<80275f24>] 0x80275f24
[32979.140616] [<80007388>] 0x80007388
[32979.144084] 
[32979.145677] ---[ end trace 8822d76274df4639 ]---
[32979.150503] dst_release: dst:8d4b4b00 refcnt:-1
SvenRoederer commented 4 years ago

Just seen, that there are 2 sources of the kernel-error:

So are this probably two separate issues or really the same which cause different errors?

pmelange commented 4 years ago

I don't want to see any kernel dumps of any kind :)

I'm leaving the router online until it crashes. Then I'll go back to the good old trusty WDR4900 with gonzo-rc2. I just hope I'm around when the router crashes and that the ca 70 people who use freifunk around here won't be cut-off from their youtube/facebook/ebay for too long.


Here is a kernel log for another rb350gr3. It has a mix of include/net/dst.h:256, net/sched/sch_generic.c:320 and mtk_soc_eth 1e100000.ethernet eth0: transmit timed out. This router suffers from the "dies or sthg" issue

[20157.340982] ------------[ cut here ]------------
[20157.345672] WARNING: CPU: 2 PID: 0 at ./include/net/dst.h:256 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.355717] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables compat act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress gpio_beeper input_core
[20157.426928]  ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipip tunnel4 ip_tunnel veth leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[20157.450316] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.98 #0
[20157.456345] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[20157.464745]         00000000 00000000 00000000 00000000 00000000 00000001 8fc11c30 ac07f57b
[20157.473113]         8fc11cc8 00000000 00000000 00003f00 00000038 8044e3d8 00000008 00000000
[20157.481469]         00000000 804e0000 0006df0c 00000000 8fc11c10 00000000 80500000 8e8a14c8
[20157.489816]         00000009 00000100 00000001 00000003 00000003 8027faf4 00000008 80540008
[20157.498169]         ...
[20157.500622] Call Trace:
[20157.500689] [<8044e3d8>] 0x8044e3d8
[20157.506615] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.513379] [<8027faf4>] 0x8027faf4
[20157.516880] [<80010050>] 0x80010050
[20157.520359] [<80010058>] 0x80010058
[20157.523835] [<8043762c>] 0x8043762c
[20157.527322] [<80071254>] 0x80071254
[20157.530819] [<8002ee48>] 0x8002ee48
[20157.534319] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.541111] [<8002ef0c>] 0x8002ef0c
[20157.544604] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.551445] [<8ef84300>] 0x8ef84300 [ip_tables@8ef84000+0x2830]
[20157.557395] [<80368794>] 0x80368794
[20157.560906] [<8037226c>] 0x8037226c
[20157.564384] [<80368794>] 0x80368794
[20157.567883] [<80371eb0>] 0x80371eb0
[20157.571376] [<80370be8>] 0x80370be8
[20157.574858] [<8041443c>] 0x8041443c
[20157.578386] [<803703dc>] 0x803703dc
[20157.581881] [<80328024>] 0x80328024
[20157.585360] [<80015550>] 0x80015550
[20157.588865] [<8032aae4>] 0x8032aae4
[20157.592354] [<8032e314>] 0x8032e314
[20157.595824] [<80076bd0>] 0x80076bd0
[20157.599310] [<80454810>] 0x80454810
[20157.602792] [<800335ac>] 0x800335ac
[20157.606271] [<80235c68>] 0x80235c68
[20157.609790] [<8000b4c8>] 0x8000b4c8
[20157.613281] 
[20157.614909] ---[ end trace abc5a3d60b545c8d ]---
[20157.619801] dst_release: dst:8ec8d500 refcnt:-1
[52394.114464] ------------[ cut here ]------------
[52394.119113] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 0x80354ec8
[52394.126182] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[52394.133132] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables compat act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress gpio_beeper input_core
[52394.204218]  ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipip tunnel4 ip_tunnel veth leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[52394.227599] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W       4.14.98 #0
[52394.234801] Stack : 00000000 00000000 00000000 8fe73440 00000000 00000000 00000000 00000000
[52394.243156]         00000000 00000000 00000000 00000000 00000000 00000001 8fc15d60 ac07f57b
[52394.251500]         8fc15df8 00000000 00000000 00004c10 00000038 8044e3d8 00000008 00000000
[52394.259836]         00000000 804e0000 0003790f 00000000 8fc15d40 00000000 80500000 80354ec8
[52394.268173]         00000009 00000140 00000003 8fe73440 00000001 8027faf4 0000000c 8054000c
[52394.276506]         ...
[52394.278942] Call Trace:
[52394.279010] [<8044e3d8>] 0x8044e3d8
[52394.284916] [<80354ec8>] 0x80354ec8
[52394.288393] [<8027faf4>] 0x8027faf4
[52394.291867] [<80010050>] 0x80010050
[52394.295340] [<80010058>] 0x80010058
[52394.298813] [<8043762c>] 0x8043762c
[52394.302292] [<80070304>] 0x80070304
[52394.305790] [<8002ee48>] 0x8002ee48
[52394.309264] [<80354ec8>] 0x80354ec8
[52394.312746] [<8002eeac>] 0x8002eeac
[52394.316237] [<80354ec8>] 0x80354ec8
[52394.319710] [<80097870>] 0x80097870
[52394.323195] [<80354d1c>] 0x80354d1c
[52394.326670] [<80087074>] 0x80087074
[52394.330149] [<80087288>] 0x80087288
[52394.333628] [<80077850>] 0x80077850
[52394.337121] [<80454810>] 0x80454810
[52394.340593] [<800335ac>] 0x800335ac
[52394.344062] [<80235c68>] 0x80235c68
[52394.347543] [<8000b4c8>] 0x8000b4c8
[52394.351016] 
[52394.352590] ---[ end trace abc5a3d60b545c8e ]---
[52394.357270] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[52394.363451] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[52394.369553] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eed0000, max=0, ctx=3398, dtx=3398, fdx=3397, next=3398
[52394.380516] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e070000, max=0, calc=1412, drx=1413
[52394.394715] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
[52394.410152] mtk_soc_eth 1e100000.ethernet: PPE started
[78432.202961] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[78432.209142] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[78432.215184] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f240000, max=0, ctx=3977, dtx=3977, fdx=3976, next=3977
[78432.226039] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0cfa0000, max=0, calc=2375, drx=2376
[78432.240695] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[78432.255306] mtk_soc_eth 1e100000.ethernet: PPE started
[94287.287041] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[94287.293232] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[94287.299288] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0cc30000, max=0, ctx=2725, dtx=2725, fdx=2724, next=2725
[94287.310261] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ce00000, max=0, calc=2090, drx=2091
[94287.324265] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[94287.339236] mtk_soc_eth 1e100000.ethernet: PPE started
[130272.363155] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[130272.369432] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[130272.375549] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ea60000, max=0, ctx=948, dtx=948, fdx=947, next=948
[130272.386211] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ed50000, max=0, calc=1107, drx=1108
[130272.400214] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5e60000c, 0x10c = 0x80818
[130272.414209] mtk_soc_eth 1e100000.ethernet: PPE started
[174322.500623] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[174322.506900] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[174322.513042] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f2e0000, max=0, ctx=797, dtx=797, fdx=796, next=797
[174322.523726] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ce50000, max=0, calc=2701, drx=2702
[174322.537464] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[174322.551527] mtk_soc_eth 1e100000.ethernet: PPE started
[241107.744361] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[241107.750655] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[241107.756804] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0d960000, max=0, ctx=334, dtx=334, fdx=333, next=334
[241107.767423] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0cc20000, max=0, calc=1867, drx=1868
[241107.787661] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[241107.802788] mtk_soc_eth 1e100000.ethernet: PPE started
[382128.124724] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[382128.130997] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[382128.137129] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0cc80000, max=0, ctx=3821, dtx=3821, fdx=3820, next=3821
[382128.148126] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c160000, max=0, calc=576, drx=577
[382128.161844] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[382128.177572] mtk_soc_eth 1e100000.ethernet: PPE started
[408753.174559] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[408753.180827] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[408753.186982] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e230000, max=0, ctx=1009, dtx=1009, fdx=1008, next=1009
[408753.198542] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0cdc0000, max=0, calc=4039, drx=4040
[408753.212265] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5e60000c, 0x10c = 0x80818
[408753.226380] mtk_soc_eth 1e100000.ethernet: PPE started
[436068.295884] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[436068.302158] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[436068.308296] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0c110000, max=0, ctx=2654, dtx=2654, fdx=2653, next=2654
[436068.319370] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c250000, max=0, calc=3828, drx=3830
[436068.333100] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[436068.347219] mtk_soc_eth 1e100000.ethernet: PPE started
[453368.346248] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[453368.352523] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[453368.358630] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eed0000, max=0, ctx=3044, dtx=3044, fdx=3043, next=3044
[453368.369556] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c1d0000, max=0, calc=3659, drx=3660
[453368.383524] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5b60000c, 0x10c = 0x80818
[453368.398532] mtk_soc_eth 1e100000.ethernet: PPE started
[460258.369148] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[460258.375432] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[460258.381545] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ea20000, max=0, ctx=1432, dtx=1432, fdx=1431, next=1432
[460258.392620] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c210000, max=0, calc=3940, drx=3941
[460258.406803] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[460258.421713] mtk_soc_eth 1e100000.ethernet: PPE started
[510304.273624] ------------[ cut here ]------------
[510304.278398] WARNING: CPU: 3 PID: 0 at ./include/net/dst.h:256 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.288468] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables compat act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress gpio_beeper input_core
[510304.359545]  ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipip tunnel4 ip_tunnel veth leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[510304.382947] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W       4.14.98 #0
[510304.390235] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[510304.398673]         00000000 00000000 00000000 00000000 00000000 00000001 8fc15c30 ac07f57b
[510304.407104]         8fc15cc8 00000000 00000000 00007b40 00000038 8044e3d8 00000008 00000000
[510304.415530]         00000000 804e0000 0005d7e3 20202020 8fc15c10 00000000 80500000 8e8a14c8
[510304.423960]         00000009 00000100 00000001 00000003 00000002 8027faf4 0000000c 8054000c
[510304.432400]         ...
[510304.434938] Call Trace:
[510304.435007] [<8044e3d8>] 0x8044e3d8
[510304.441105] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.447953] [<8027faf4>] 0x8027faf4
[510304.451525] [<80010050>] 0x80010050
[510304.455080] [<80010058>] 0x80010058
[510304.458660] [<8043762c>] 0x8043762c
[510304.462233] [<80071254>] 0x80071254
[510304.465817] [<8002ee48>] 0x8002ee48
[510304.469416] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.476320] [<8002ef0c>] 0x8002ef0c
[510304.479892] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.486758] [<80368794>] 0x80368794
[510304.490361] [<8037226c>] 0x8037226c
[510304.493953] [<80371eb0>] 0x80371eb0
[510304.497562] [<80370be8>] 0x80370be8
[510304.501142] [<8041443c>] 0x8041443c
[510304.504715] [<803703dc>] 0x803703dc
[510304.508297] [<80328024>] 0x80328024
[510304.511861] [<80015550>] 0x80015550
[510304.515439] [<8032aae4>] 0x8032aae4
[510304.519020] [<8032e314>] 0x8032e314
[510304.522584] [<80076bd0>] 0x80076bd0
[510304.526184] [<80454810>] 0x80454810
[510304.529746] [<800335ac>] 0x800335ac
[510304.533300] [<80235c68>] 0x80235c68
[510304.536868] [<8000b4c8>] 0x8000b4c8
[510304.540427] 
[510304.542062] ---[ end trace abc5a3d60b545c8f ]---
[510304.596544] dst_release: dst:8f7b7b80 refcnt:-1
[552153.577508] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[552153.583779] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[552153.589920] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ed50000, max=0, ctx=2821, dtx=2821, fdx=2820, next=2821
[552153.600961] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c200000, max=0, calc=904, drx=905
[552153.614498] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5c60000c, 0x10c = 0x80818
[552153.628660] mtk_soc_eth 1e100000.ethernet: PPE started
pmelange commented 4 years ago

And here, an ERX-SFP (coloniaallee)

[54115.113725] ------------[ cut here ]------------
[54115.122941] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:306 0x802a0ba0()
[54115.137357] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[54115.151252] Modules linked in: ifb iptable_nat nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress i2c_dev batman_adv libcrc32c cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables l2tp_ip6 l2tp_ip l2tp_eth l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipip tunnel4 ip_tunnel leds_gpio gpio_button_hotplug crc32c_generic [last unloaded: ifb]
[54115.326904] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.167 #0
[54115.338847] Stack : 00000000 00000000 80436882 00000034 00000000 00000000 00000000 00000000
[54115.338847]    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847]    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847]    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847]    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847]    ...
[54115.409584] Call Trace:[<8001653c>] 0x8001653c
[54115.418461] [<8001653c>] 0x8001653c
[54115.425391] [<801a72cc>] 0x801a72cc
[54115.432331] [<8002bb90>] 0x8002bb90
[54115.439263] [<802a0ba0>] 0x802a0ba0
[54115.446192] [<8002bbec>] 0x8002bbec
[54115.453133] [<802a0ba0>] 0x802a0ba0
[54115.460060] [<802a0948>] 0x802a0948
[54115.466989] [<80070ca0>] 0x80070ca0
[54115.473918] [<8025cd7c>] 0x8025cd7c
[54115.480843] [<8006df94>] 0x8006df94
[54115.487769] [<80070ef0>] 0x80070ef0
[54115.494707] [<8002e6b4>] 0x8002e6b4
[54115.501651] [<8002e994>] 0x8002e994
[54115.508589] [<801cd270>] 0x801cd270
[54115.515531] [<80005988>] 0x80005988
[54115.522460] 
[54115.525577] ---[ end trace 0ef5542dd3a7a2f3 ]---
[54115.534808] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[54115.547186] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[54115.559230] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f31e000, max=512, ctx=440, dtx=440, fdx=439, next=440
[54115.580593] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f33c000, max=512, calc=13, drx=14
pmelange commented 4 years ago

The test router had another kernel warning and reboot itself. Uptime 114195 seconds (just under 32 hours)

Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.012727] ------------[ cut here ]------------
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.017472] WARNING: CPU: 1 PID: 0 at include/net/dst.h:256 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.027391] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables compat gpio_beeper input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ipip tunnel4 ip_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.095014] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.14.171 #0
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.102392] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.110822]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0bb20 ac07f5b2
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.119254]         8fc0bbb8 00000000 00000000 00000000 00000038 804997f8 00000008 00000000
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.127701]         00000000 00000000 00017326 20202020 00000000 8fc0bb00 00000000 8f14455c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.136142]         8f1447fc 00000100 00000001 00000003 00000003 802c096c 00000004 80690004
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.144592]         ...
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.147124] Call Trace:
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.147177] [<804997f8>] 0x804997f8
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.153276] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.160136] [<802c096c>] 0x802c096c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.163716] [<8000bf28>] 0x8000bf28
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.167275] [<8000bf30>] 0x8000bf30
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.170835] [<80560000>] 0x80560000
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.174402] [<80482754>] 0x80482754
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.177976] [<800773c4>] 0x800773c4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.181545] [<8002ed30>] 0x8002ed30
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.185121] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.191999] [<8002e9d4>] 0x8002e9d4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.195566] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.202429] [<803ad4e4>] 0x803ad4e4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.206005] [<803b7310>] 0x803b7310
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.209577] [<803b6f10>] 0x803b6f10
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.213161] [<803b6020>] 0x803b6020
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.216732] [<80461630>] 0x80461630
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.220303] [<803b5670>] 0x803b5670
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.223892] [<8036f60c>] 0x8036f60c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.227468] [<80371d7c>] 0x80371d7c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.231025] [<803ba4cc>] 0x803ba4cc
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.234602] [<8046b2a8>] 0x8046b2a8
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.238177] [<8046b5f0>] 0x8046b5f0
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.241741] [<8046b318>] 0x8046b318
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.245306] [<8036f308>] 0x8036f308
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.248894] [<8036f91c>] 0x8036f91c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.252468] [<8037220c>] 0x8037220c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.256036] [<8007d0c4>] 0x8007d0c4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.259618] [<8049f950>] 0x8049f950
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.263187] [<800336c8>] 0x800336c8
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.266751] [<80275f24>] 0x80275f24
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.270323] [<80007388>] 0x80007388
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.273885]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.275612] ---[ end trace 8822d76274df463a ]---
SvenRoederer commented 4 years ago

another rb350gr3.

So also the MikroTik RG750Gr3 devices are affected? Even we don't see completely freeze of network here.

pmelange commented 4 years ago

also the MikroTik RG750Gr3 devices are affected?

All mt7621 devices are affected.

pmelange commented 4 years ago

Even we don't see completely freeze of network here.

Verklarung-core has almost no traffic. That's probably why it's not causing problems.

Perleberger36 and coloniaallee have a lot of traffic, and if you look at the uptime, every time it reboots is because of a kernel crash and either the rooter reboots itself or the watchdog reboots it.

The test was done at the scherer8, which also has a lot of traffic. 32hrs and it rebooted itself.

SvenRoederer commented 4 years ago

in https://github.com/freifunk-berlin/firmware/commit/179c1409c2cfd27e8924fc63a8b6d60e328282c4 there is a reference to openwrt-commit 498f1f4f5d, which reads that it might fix the cause of the problem.

pmelange commented 4 years ago

New Patch made it into the OpenWRT master branch https://github.com/openwrt/openwrt/pull/2942#issuecomment-639095990

SvenRoederer commented 4 years ago

As usually these OpenWrt-commits have been added to the "daily/upstream-master" branch automatically in 1d4f5c91da0228f61e0716257b7e36436c314f09.

So some tests need to be carried out.

hmh commented 1 year ago

This seems to be fixed since OpenWRT 21.02 ? Should it be closed as fixed ?

Akira25 commented 1 year ago

This seems to be fixed since OpenWRT 21.02 ? Should it be closed as fixed ?

We could do it, if you want to. Anyway, this project is not maintained anymore since some years. We use the falter-firmware now: https://github.com/freifunk-berlin/falter-packages