Open pparent76 opened 6 years ago
the call stack seems irrelevant. note that openwrt's network script (including sbin/wifi) wont be able to operate these drivers. you may use "ifconfig up/down" directly or install the luci-app-mtk plugin.
Thanks a lot for your quick answer. The problem is restarting the whole network is needed to reload network configuration file... ( I do not do it for the wifi interface)
Also this bug seems to appear in various circumstances: For example I start coova-chilli (creates a tunnel interface over br-lan, whether ra0 is in it or not): same crash.
So this is a real problem for usability of this driver.
Thanks a lot for your efforts!
root@$ insmod /root/mt7628-for-mt7628-linux-4.4.108.ko
[ 30.354174] mt7628: module license 'unspecified' taints kernel.
[ 30.365929] Disabling lock debugging due to kernel taint
[ 30.434505]
[ 30.434505]
[ 30.434505] === pAd = c068d000, size = 1292832 ===
[ 30.434505]
[ 30.452968] <-- RTMPAllocTxRxRingMemory, Status=0, ErrorValue=0x
[ 30.466123] <-- RTMPAllocAdapterBlock, Status=0
[ 30.475130] RtmpChipOpsHook(492): Not support for HIF_MT yet!
[ 30.486515] mt7628_init()-->
[ 30.492229] mt7628_init(FW(8a00), HW(8a01), CHIPID(7628))
[ 30.502927] e2.bin mt7628_init(1135)::(2), pChipCap->fw_len(63888)
[ 30.515170] mt_bcn_buf_init(218): Not support for HIF_MT yet!
[ 30.526560] <--mt7628_init()
root@$ chilli -c /etc/chilli.conf
root@$ [ 40.814719] Unhandled kernel unaligned access[#1]:
[ 40.824227] CPU: 0 PID: 1224 Comm: chilli Tainted: P 4.4.108 #0
[ 40.838525] task: 83332840 ti: 833de000 task.ti: 833de000
[ 40.849203] $ 0 : 00000000 7f8194c8 00000001 0000000e
[ 40.859559] $ 4 : fffffdc9 80360000 00000010 000004c8
[ 40.869909] $ 8 : 00000000 00000000 00000000 00000000
[ 40.880256] $12 : 00000000 00000000 00000000 00000000
[ 40.890603] $16 : fffffdc9 fffffdc9 000004c8 803a8848
[ 40.900952] $20 : 00000010 80390000 00000000 00000000
[ 40.911300] $24 : 00000000 00000000
[ 40.921648] $28 : 833de000 833dfbf0 00000000 801fb37c
[ 40.931999] Hi : 00000000
[ 40.937687] Lo : 00000fea
[ 40.943414] epc : 801fb3a0 netdev_master_upper_dev_get+0x38/0x70
[ 40.955648] ra : 801fb37c netdev_master_upper_dev_get+0x14/0x70
[ 40.967873] Status: 1100a403 KERNEL EXL IE
[ 40.976157] Cause : 00800010 (ExcCode 04)
[ 40.984082] BadVA : fffffe2d
[ 40.989773] PrId : 00019655 (MIPS 24KEc)
[ 40.997697] Modules linked in: mt7628(P) pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt sch_fq sch_teql em_nbyte sch_dsmark sch_pie sch_gred em_cmp cls_basic act_ipt sch_prio sch_codel em_text em_meta sch_sfq act_police sch_red act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb tun leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 41.191942] Process chilli (pid: 1224, threadinfo=833de000, task=83332840, tls=779ebd48)
[ 41.207953] Stack : 00000000 00000000 00000000 83821c00 83821c00 8020fe70 801efee0 000020c0
83561780 00000002 00000010 00000002 80390000 803a94d0 00000000 00010000
00000064 000005dc 00000000 00000000 00000100 00000002 00000008 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
8033a198 80340000 83821c00 00000002 020152c0 83802780 801efee0 000020c0
...
[ 41.278330] Call Trace:
[ 41.283173] [<801fb3a0>] netdev_master_upper_dev_get+0x38/0x70
[ 41.294729] [<8020fe70>] rtnl_fill_ifinfo+0x58/0x9b0
[ 41.304553] [<802108f0>] rtnl_dump_ifinfo+0x128/0x1d8
[ 41.314557] [<80229138>] netlink_dump+0x110/0x2bc
[ 41.323865] [<802294c4>] netlink_recvmsg+0x1e0/0x36c
[ 41.333701] [<801e8338>] SyS_recvfrom+0xb0/0x124
[ 41.342848] [<8000686c>] syscall_common+0x30/0x54
[ 41.352149]
[ 41.355083]
Code: 248438c0 0c053952 00000000 <8e020064> 26100064 10500006 00000000 9043fff8 10600003
[ 41.375009] ---[ end trace 6dee52527c4b4bc1 ]---
[ 41.386834] Fatal exception: panic in 5 seconds
[ 46.400703] Kernel panic - not syncing: Fatal exception
[ 46.412850] Rebooting in 3 seconds..
s{1
you can try "mtkwifi reset" or "mtkwifi reload".
Why would I do that? To reload mtkwifi config?
That's not what I'm looking for. I'm looking for reloading /etc/config/network file, but mostly to be able to start and restart coova-chilli. If I can't do that without a crash then the driver is useless for me.
@pparent76 Try build image without mt76 wireless drivers, or disable them on startup.
@alangregory That's already what I did: build without mt76.
Are you able to restart network without crash on your side, with mt7628 driver on?
Thanks for your message.
Does the crash happen if mt7628.ko is not present? (or with all wifi interface down?)
The driver does not follow mac80211 architecture, you can remove these components, including rt2x00, mt76, mac80211, cfg80211, wpad-mini, compat-wireless... You'd better remove /etc/config/wireless too, if it's present.
Does the crash happen if mt7628.ko is not present?
no, except if it as loaded and then unloaded. But if it was never loaded, it never happens.
The driver does not follow mac80211 architecture, you can remove these components, including rt2x00, mt76, mac80211, cfg80211, wpad-mini, compat-wireless... You'd better remove /etc/config/wireless too, if it's present.
Ok thanks I keep you updated.
I've added to my conf: -wpad-mini -kmod-mt76 -kmod-mac80211 -kmod-rt2x00-lib -kmod-rt2x00-mmio -kmod-rt2x00-pci -kmod-rt2x00-usb -hostapd -kmod-cfg80211
exact same crash when starting chilli or restart network.
Important update: the crash happens even if I only do "ifconfig br-lan down", even ra0 is down and not in br-lan. Same thing with br-lan2 and probably any bridge interface.
root@$ insmod /lib/modules/4.4.108/mt7628-for-mt7628-linu
x-4.4.108.ko
[ 120.200477] mt7628: module license 'unspecified' taints kernel.
[ 120.212236] Disabling lock debugging due to kernel taint
[ 120.291461]
[ 120.291461]
[ 120.291461] === pAd = c13f7000, size = 1292832 ===
[ 120.291461]
[ 120.309926] <-- RTMPAllocTxRxRingMemory, Status=0, ErrorValue=0x
[ 120.323089] <-- RTMPAllocAdapterBlock, Status=0
[ 120.332109] RtmpChipOpsHook(492): Not support for HIF_MT yet!
[ 120.343496] mt7628_init()-->
[ 120.349195] mt7628_init(FW(8a00), HW(8a01), CHIPID(7628))
[ 120.359892] e2.bin mt7628_init(1135)::(2), pChipCap->fw_len(63888)
[ 120.372137] mt_bcn_buf_init(218): Not support for HIF_MT yet!
[ 120.383524] <--mt7628_init()
root@$ brctl show
bridge name bridge id STP enabled interfaces
br-lan 7fff.3e11f5324972 no eth0.1
br-lan2 7fff.3e11f5324972 no eth0.6
root@$ ifconfig br-lan down
[ 135.218893] br-lan: port 1(eth0.1) entered disabled state
[ 135.232076] Unhandled kernel unaligned access[#1]:
[ 135.241583] CPU: 0 PID: 674 Comm: odhcpd Tainted: P 4.4.108 #0
[ 135.255711] task: 838ec000 ti: 831d0000 task.ti: 831d0000
[ 135.266390] $ 0 : 00000000 005aa048 fffffdc9 000000a0
[ 135.276748] $ 4 : 8028ed00 00000000 00000000 00000000
[ 135.287096] $ 8 : 00000080 80b8c0a8 00000000 00000e8e
[ 135.297444] $12 : 7fad4dc0 ffffff80 00000000 7737a2c0
[ 135.307792] $16 : 8391b540 831845ec 00000000 00000001
[ 135.318141] $20 : 00000000 821b6d8c 00000000 00000028
[ 135.328489] $24 : 00000000 772df000
[ 135.338837] $28 : 831d0000 831d1bb8 00000014 8028ed0c
[ 135.349189] Hi : 00000000
[ 135.354879] Lo : 00000e8e
[ 135.360581] epc : 8028e9b0 inet6_dump_addr+0x10c/0x4f0
[ 135.371093] ra : 8028ed0c inet6_dump_addr+0x468/0x4f0
[ 135.381598] Status: 1100a403 KERNEL EXL IE
[ 135.389883] Cause : 00800010 (ExcCode 04)
[ 135.397809] BadVA : ffffffa9
[ 135.403501] PrId : 00019655 (MIPS 24KEc)
[ 135.411423] Modules linked in: mt7628(P) xt_coova pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt sch_fq sch_teql em_nbyte sch_dsmark sch_pie sch_gred em_cmp cls_basic act_ipt sch_prio sch_codel em_text em_meta sch_sfq act_police sch_red act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb tun leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common [last unloaded: xt_coova]
[ 135.611687] Process odhcpd (pid: 674, threadinfo=831d0000, task=838ec000, tls=7737bd48)
[ 135.627525] Stack : 00000000 8391b540 020012c0 801efdf4 00000014 00000002 fffffdc9 00000000
00000000 000000a0 8035b4bc 821b6e04 83802780 00000000 803a8848 00000001
8028ed00 00000001 83184400 8391b540 00004000 831845ec 00004000 821e7300
00000008 77374d8c 00000000 80229138 00000000 00000001 00000000 000003ec
00000000 000003ec 83184400 831d1c98 83184464 8391bcc0 831845ec 80229868
...
[ 135.697905] Call Trace:
[ 135.702740] [<8028e9b0>] inet6_dump_addr+0x10c/0x4f0
[ 135.712575] [<80229138>] netlink_dump+0x110/0x2bc
[ 135.721883] [<80229868>] __netlink_dump_start+0x100/0x1b0
[ 135.732576] [<80213794>] rtnetlink_rcv_msg+0x164/0x200
[ 135.742747] [<8022b860>] netlink_rcv_skb+0x7c/0xf8
[ 135.752228] [<802112d0>] rtnetlink_rcv+0x24/0x34
[ 135.761366] [<8022b0c4>] netlink_unicast+0x158/0x240
[ 135.771191] [<8022b62c>] netlink_sendmsg+0x388/0x3cc
[ 135.781024] [<801e62dc>] sock_sendmsg+0x18/0x30
[ 135.789993] [<801e76f4>] ___sys_sendmsg+0x18c/0x224
[ 135.799651] [<801e8624>] __sys_sendmsg+0x48/0x7c
[ 135.808797] [<8000686c>] syscall_common+0x30/0x54
[ 135.818099]
[ 135.821033]
Code: 0002b00b 0000b021 8fa20018 <8c4201e0> 104000d9 00009021 8f830010 24630200 af830010
[ 135.840789] ---[ end trace edbfa4873ca49edc ]---
[ 135.856950] Fatal exception: panic in 5 seconds
Try disabling ipv6 on build to test.
@alangregory How do I do that?
CONFIG_IPV6=n
Exact same crash with ipv6 disabled...
Interestingly when it was activated the command "sysctl -w net.ipv6.conf.all.disable_ipv6=1" also provoked the crash.
your test result makes me confused, because I don't see any clue that shows mt7628 was involved.
the callstack said netlink api crashed with an unaligned access when someone tried to send a unicast message to some kernel module. the message had not go to any module yet, it failed in the kernel path.
What I can do is to go through the code to check if mt7628 registered any netlink hook. I don't really think so...
Ok thanks a lot for your effort!
The only evidence that mt7628 is involved, is that without it, I've never seen this crash happen.
My guess is that the module must somehow put a mess in &dev or &net or other variables, and insert a wrong pointer, and then when this pointer is accessed in inet6_dump_addr or netdev_master_upper_dev_get then it crashes with unaligned access.
(But without the code I cannot do or say more).
Thanks again.
Can anyone reproduce this bug? Or can anyone restart network without crash?
What is the model of the device?
ZBT 3326, but I guess it should do pretty much the same with any mt7628nn device (because this does not depend on the board).
I've added the following in target/linux/ramips/image/mt7628.mk:
define Device/mt7628
DTS := WT3326
IMAGE_SIZE := $(ralink_default_fw_size_8M)
DEVICE_TITLE := ZBT wt3326
DEVICE_PACKAGES := kmod-usb2 kmod-usb-ohci kmod-usb-ledtrig-usbport
endef
TARGET_DEVICES += wt3326
And here is my WT3326.dts: WT3326.dts.txt
This repo worked for me! It contains the sources of the driver and patch applied:
@pparent76 Driver from that repo used an earlier version of MT7628. the patchwork was similar expect its FlashRead/Write was implemented using mtd api, which is not allowed according to MTK's proprietary license.
I just had a few days off. I'm going to work on this.
Well quite frankly I don't see why we should make so much of a big deal to respect MTK's proprietary license, When Mediatek did not make so much of a big deal, when it came to respecting our license (GPL) [1].
Well of course we cannot do as if this driver was free software. But it is disrespectful from Mediatek to not allow it's indirect and direct customers, to actually use the hardware they paid for. So I don't care about using a repo that allows me to overcome this disrespect, for my personal use, not that I would publish any code myself, or encourage anyone to do so.
On top of that I'm not so sure that what you are doing is anymore legal except if you have agreement with Mediatek, the license says you don't have the right to study or modify the driver which obviously you do.
Thanks a lot for your time anyways.
[1] See one example (there are others): https://www.xda-developers.com/have-you-paid-your-linux-kernel-source-license-fee/
I uploaded the firmware I built and its configuration here: http://119.23.148.191:8888/tmp/lede-ramips-mt7628-wrtnode2p-squashfs-sysupgrade.config http://119.23.148.191:8888/tmp/lede-ramips-mt7628-wrtnode2p-squashfs-sysupgrade.bin
both "ifconfig br-lan down/up" and "/etc/init.d/network restart" works properly. You can have a try if you got a compatible device.
I never heard of "ZBT wt3326" before, you can mail me its dts, config and Makefile, I can build one myself. On 1/8/2018 19:01,Pierre Parentnotifications@github.com wrote:
ZBT 3326, but I guess it should do pretty much the same with an mt7628nn device (because this does not depend on the board).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Sorry for the late reply, I was away for few days.
Your image seems to work on my hardware, when I restart network: no crash.
The problem is to make any meaningful test I need an image builder with the following packages accessible:
bind-dig kmod-sched kmod-mt7628 grep dmesg kmod-br-netfilter libopenssl lighttpd-mod-authn_file coova-chilli kmod-ipt-coova librt qos-scripts wget ca-certificates miniupnpc openvpn-openssl openvpn-easy-rsa sudo lighttpd lighttpd-mod-cgi lighttpd-mod-auth luci tcpdump
And also I need to cross compile coova-chilli with specific options, so having the SDK would be good.
Did you make a correction on the .ko file, or is it only that you sent me the whole image instead of only .ko file that made the difference? Is it usefull that I try recompiling from lede sources as I did before?
what I sent to you was the whole image with the same driver from this repo. no changes were made on the driver yet. maybe you can rip off some suspicious components to test if the issue is gone.
LEDE17.01 imagebuilder for mt7628 is available in:
https://github.com/gl-inet/imagebuilder-lede-ramips
On Wed, Jan 17, 2018 at 5:36 PM, Pierre Parent notifications@github.com wrote:
Sorry for the late reply, I was away for few days.
Your image seems to work on my hardware, when I restart network.
The problem is to make any meaningful test I need an image builder with the following packages accessible:
bind-dig kmod-sched kmod-mt7628 grep dmesg kmod-br-netfilter libopenssl lighttpd-mod-authn_file coova-chilli kmod-ipt-coova librt qos-scripts wget ca-certificates miniupnpc openvpn-openssl openvpn-easy-rsa sudo lighttpd lighttpd-mod-cgi lighttpd-mod-auth luci
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Nossiac/mtk-openwrt-feeds/issues/10#issuecomment-358248707, or mute the thread https://github.com/notifications/unsubscribe-auth/AYZFGFp-oQ0NySyABRrFF9qICs1qxpHXks5tLb8egaJpZM4RQoU9 .
kmod-mt7628 for proprietary driver and kmod-mt76 for open driver.
On Wed, Jan 17, 2018 at 5:36 PM, Pierre Parent notifications@github.com wrote:
Sorry for the late reply, I was away for few days.
Your image seems to work on my hardware, when I restart network.
The problem is to make any meaningful test I need an image builder with the following packages accessible:
bind-dig kmod-sched kmod-mt7628 grep dmesg kmod-br-netfilter libopenssl lighttpd-mod-authn_file coova-chilli kmod-ipt-coova librt qos-scripts wget ca-certificates miniupnpc openvpn-openssl openvpn-easy-rsa sudo lighttpd lighttpd-mod-cgi lighttpd-mod-auth luci
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Nossiac/mtk-openwrt-feeds/issues/10#issuecomment-358248707, or mute the thread https://github.com/notifications/unsubscribe-auth/AYZFGFp-oQ0NySyABRrFF9qICs1qxpHXks5tLb8egaJpZM4RQoU9 .
Ok thanks.
I guess It's because you use the exact same toolchain to compile the kernel and the module. But when I compile the kernel myself I don't have the exact same toolochain that you used to compile the .ko
Well, for now, I'm not going to spend too much time working around this, as the other repository seems to work for me.