aparcar / openwrt

Staging tree of Paul Spooren
Other
8 stars 1 forks source link

FS#1362 - kernel hangs unexpectedly on mt7621 #1168

Open aparcar opened 6 years ago

aparcar commented 6 years ago

vzhestkov:

Usually system just hangs with no messages in the log. I've tried to enable log writing to external USB flash, but the log just stops in the time when the issue occurs. I have no idea how to reproduce it. It could happen 3-5 times a day, but sometimes systems works for a week with no issues.

I've tried a lot of snapshot builds since 20180111, but the issue still exists.

The current version installed: Snapshot 20180212 Hardware: Xiaomi Router 3G ramips-mt7621

opkg list: base-files - 184-r6069-a464fba busybox - 1.27.2-3 dnsmasq - 2.78-10 dropbear - 2017.75-5 firewall - 2017-11-07-c4309372-2 fstools - 2018-02-11-3d239815-1 fwtool - 1 hostapd-common - 2017-08-24-c2d4f2eb-6 ip6tables - 1.6.1-2 iptables - 1.6.1-2 iw - 4.9-1 jshn - 2018-02-08-bb0c830b-1 jsonfilter - 2016-07-02-dea067ad-1 kernel - 4.9.77-1-2b01b81f4010dbad63c4aa03d212c19b kmod-cfg80211 - 4.9.77+2017-11-01-3 kmod-gpio-button-hotplug - 4.9.77-2 kmod-ip6tables - 4.9.77-1 kmod-ipt-conntrack - 4.9.77-1 kmod-ipt-core - 4.9.77-1 kmod-ipt-nat - 4.9.77-1 kmod-leds-gpio - 4.9.77-1 kmod-lib-crc-ccitt - 4.9.77-1 kmod-mac80211 - 4.9.77+2017-11-01-3 kmod-mt76-core - 4.9.77+2018-02-09-246d548b-1 kmod-mt7603 - 4.9.77+2018-02-09-246d548b-1 kmod-mt76x2 - 4.9.77+2018-02-09-246d548b-1 kmod-nf-conntrack - 4.9.77-1 kmod-nf-conntrack6 - 4.9.77-1 kmod-nf-ipt - 4.9.77-1 kmod-nf-ipt6 - 4.9.77-1 kmod-nf-nat - 4.9.77-1 kmod-nf-reject - 4.9.77-1 kmod-nf-reject6 - 4.9.77-1 kmod-nls-base - 4.9.77-1 kmod-ppp - 4.9.77-1 kmod-pppoe - 4.9.77-1 kmod-pppox - 4.9.77-1 kmod-slhc - 4.9.77-1 kmod-tun - 4.9.77-1 kmod-usb-core - 4.9.77-1 kmod-usb-ledtrig-usbport - 4.9.77-1 kmod-usb3 - 4.9.77-1 lede-keyring - 2017-01-20-a50b7529-1 libblobmsg-json - 2018-02-08-bb0c830b-1 libc - 1.1.18-1 libgcc - 5.5.0-1 libip4tc - 1.6.1-2 libip6tc - 1.6.1-2 libiwinfo - 2018-01-16-5a5e21b1-1 libiwinfo-lua - 2018-01-16-5a5e21b1-1 libjson-c - 0.12.1-1 libjson-script - 2018-02-08-bb0c830b-1 liblua - 5.1.5-1 liblzo - 2.10-1 libmbedtls - 2.6.0-1 libnl-tiny - 0.1-5 libopenssl - 1.0.2n-1 libpthread - 1.1.18-1 libubox - 2018-02-08-bb0c830b-1 libubus - 2018-01-16-5bae22eb-1 libubus-lua - 2018-01-16-5bae22eb-1 libuci - 2018-01-01-5beb95da-1 libuci-lua - 2018-01-01-5beb95da-1 libuclient - 2017-11-02-4b87d831-1 libustream-mbedtls - 2016-07-02-ec80adaa-2 libxtables - 1.6.1-2 logd - 2017-11-13-e7a63fba-1 lua - 5.1.5-1 luci - git-18.039.58622-76f9f5e-1 luci-app-firewall - git-18.039.58622-76f9f5e-1 luci-app-openvpn - git-18.039.58622-76f9f5e-1 luci-base - git-18.039.58622-76f9f5e-1 luci-lib-ip - git-18.039.58622-76f9f5e-1 luci-lib-jsonc - git-18.039.58622-76f9f5e-1 luci-lib-nixio - git-18.039.58622-76f9f5e-1 luci-mod-admin-full - git-18.039.58622-76f9f5e-1 luci-proto-ipv6 - git-18.039.58622-76f9f5e-1 luci-proto-ppp - git-18.039.58622-76f9f5e-1 luci-ssl - git-18.039.58622-76f9f5e-1 luci-theme-bootstrap - git-18.039.58622-76f9f5e-1 luci-theme-material - git-18.039.58622-76f9f5e-1 mtd - 21 netifd - 2018-02-05-1be329c6-3 odhcp6c - 2017-09-05-1f93bd4c-8 odhcpd-ipv6only - 1.3-1 openssl-util - 1.0.2n-1 openvpn-easy-rsa - 3.0.1-1 openvpn-openssl - 2.4.4-2 opkg - 2017-12-07-3b417b9f-2 ppp - 2.4.7-12 ppp-mod-pppoe - 2.4.7-12 procd - 2018-01-23-653629f1-2 px5g-mbedtls - 4 rpcd - 2017-12-07-cfe1e75c-1 rpcd-mod-rrdns - 20170710 swconfig - 11 ubi-utils - 1.5.2-1 uboot-envtools - 2015.10-1 ubox - 2017-11-13-e7a63fba-1 ubus - 2018-01-16-5bae22eb-1 ubusd - 2018-01-16-5bae22eb-1 uci - 2018-01-01-5beb95da-1 uclient-fetch - 2017-11-02-4b87d831-1 uhttpd - 2017-11-04-a235636a-1 uhttpd-mod-ubus - 2017-11-04-a235636a-1 usign - 2015-07-04-ef641914-1 wireless-regdb - 2017-10-20-4343d359 wpad-mini - 2017-08-24-c2d4f2eb-6 zlib - 1.2.11-2

The only strange think I found - some amount of dropped packets on ethernet interface connected to local network.

I think it's related to local network traffic somehow. I put the anoter router in this network and both of the routers hang the same time once. The second router was Netgear wnr1000v2 based on ar71xx.

I've also tried to change the Xiaomi router with different one but same model. Nothing changed.

Twice I found the following kernel warnin in the log:

Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 416.948299] ------------[ cut here ]------------ Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 416.953073] WARNING: CPU: 0 PID: 0 at backports-2017-11-01/net/mac80211/rx.c:4316 ieee80211_rx_napi+0x1a4/0x964 [mac80211] Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 416.964092] Rate marked as a VHT rate but data is invalid: MCS: 126, NSS: 0 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 416.971084] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables tun leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.036194] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.77 #0 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.042086] Stack : 00000000 00000000 80557b4a 00000033 8040fc04 00000000 00000000 80550000 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.050434] 804f62bc 804f5ea7 8048cd04 00000000 00000000 80553824 00000004 0000099e Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.058779] 00000000 8006ab98 00000001 80550000 804fc004 804fc008 80491910 8fc0dc7c Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.067122] 00000003 800a7a50 00000004 0000099e 00000000 00000000 00000002 00c0dc7c Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.075465] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.083808] ... Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.086244] Call Trace: Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.088706] [<8000f714>] show_stack+0x54/0x88 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.093048] [<801e5d9c>] dump_stack+0x8c/0xd0 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.097384] [<8002adc4>] __warn+0xe4/0x118 Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.101462] [<8002ae28>] warn_slowpath_fmt+0x30/0x3c Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.106535] [<8ec25c7c>] ieee80211_rx_napi+0x1a4/0x964 [mac80211] Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.112640] [<8f2d9b78>] mt76_rx_complete+0x18c/0x278 [mt76] Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.118278] [<8f2d9e24>] mt76_rx_poll_complete+0x1c0/0x260 [mt76] Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.124346] [<8f2d8e64>] mt76_dma_attach+0xb94/0xcc0 [mt76] Wed Feb 14 15:36:51 2018 kern.warn kernel: [ 417.130010] ---[ end trace 2c3430e855d3296c ]---

Is there any possibility to find out what happens with the kernel. The sysctl properties set to reboot it in 3 seconds on panic, but it's not rebooting, just staying in this state.

Thanks for advance for any helpful tips.

aparcar commented 6 years ago

easyteacher:

Can Mi R3g perform a soft reboot normally if there is no kernel panic? After a kernel panic, is there any message in /sys/kernel/debug/crashlog ? I guess if you disable WiFi on R3g it will stay online longer.

aparcar commented 6 years ago

vzhestkov:

Mi R3g can perform soft reboot with no issues. There is no crashlog in /sys/kernel/debug One more strange think with the hanging. Led is changing the color from blue to orange and back and blinking sometimes. Looks like it's rebooting, but actually the only way to fix it plug out the power cord and turn it in back.

And sorry it's almost impossible to turn WiFi off on it, as it's the main function of the router now. There are just 3-4 devices on the ethernet, but most of the users on the wi-fi.

aparcar commented 6 years ago

diizzyy:

Still an issue on master/trunk?

aparcar commented 6 years ago

easyteacher:

I guess it should be fixed. The problem was likely to be related to the old mt76 driver.

aparcar commented 6 years ago

vzhestkov:

The router running r6807-58f7b5b build with 8 days uptime, but the issue was not permanent and sometimes there were no hangs for at least 2 weeks and then 3-5 hangs in a day. That's why I can't really say if the issue gone. When was the new driver version implemented?