Closed e6e6 closed 3 years ago
我被这个问题折磨几天了 kernel上游的讨论
https://bugzilla.kernel.org/show_bug.cgi?id=198931
[35767.782967] ------------[ cut here ]------------ [35767.783403] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out [35767.783475] WARNING: CPU: 2 PID: 20 at net/sched/sch_generic.c:448 dev_watchdog+0x2f4/0x300 [35767.784207] Modules linked in: fast_classifier xt_FULLCONENAT pppoe ppp_async pppox ppp_generic lzo ipt_REJECT xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_socket xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TPROXY xt_TCPMSS xt_REDIRECT xt_NETMAP xt_MASQUERADE xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY usbnet usblp slhc sch_cake rtl8150 r8152 nf_tproxy_ipv6 nf_tproxy_ipv4 nf_socket_ipv6 nf_socket_ipv4 nf_reject_ipv4 nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack_netlink nf_conncount lzo_decompress lzo_compress iptable_raw iptable_nat iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt tcp_bbr sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred cryptodev xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet [35767.784381] ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_NPT nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb tun zram zsmalloc shortcut_fe_ipv6 shortcut_fe sha256_generic libsha256 seqiv jitterentropy_rng drbg md5 hmac ghash_generic gcm des_generic libdes ctr cbc authenc crypto_acompress gpio_button_hotplug [35767.796697] CPU: 2 PID: 20 Comm: ksoftirqd/2 Not tainted 5.4.65 #0 [35767.797240] Hardware name: FriendlyARM NanoPi R2S (DT) [35767.797697] pstate: 40000005 (nZcv daif -PAN -UAO) [35767.798122] pc : dev_watchdog+0x2f4/0x300 [35767.798477] lr : dev_watchdog+0x2f4/0x300 [35767.798832] sp : ffff800010d4bc50 [35767.799124] x29: ffff800010d4bc50 x28: ffff00003bfca080 [35767.799594] x27: 0000000000000004 x26: 0000000000000140 [35767.800065] x25: 00000000ffffffff x24: 0000000000000002 [35767.800536] x23: ffff00003bf1945c x22: ffff00003bf19000 [35767.801007] x21: ffff00003bf19480 x20: ffff800010bb6000 [35767.801478] x19: 0000000000000000 x18: 0000000000000000 [35767.801949] x17: 0000000000000000 x16: 0000000000000000 [35767.802419] x15: 0000000000000000 x14: ffff800010c45112 [35767.802890] x13: 0000000000000000 x12: ffff800010c44000 [35767.803360] x11: ffff800010bd0000 x10: ffff800010c44758 [35767.803831] x9 : 0000000000000000 x8 : 0000000000000000 [35767.804301] x7 : 0000000000000005 x6 : 0000000000000124 [35767.804772] x5 : 0000000000000001 x4 : ffff00003f59f708 [35767.805242] x3 : 0000000000000006 x2 : 0000000000000007 [35767.805713] x1 : ffff00003d930000 x0 : 0000000000000039 [35767.806186] Call trace: [35767.806410] dev_watchdog+0x2f4/0x300 [35767.806742] call_timer_fn.isra.35+0x20/0x78 [35767.807122] run_timer_softirq+0x378/0x388 [35767.807487] __do_softirq+0x124/0x260 [35767.807816] run_ksoftirqd+0x3c/0x50 [35767.808141] smpboot_thread_fn+0x124/0x260 [35767.808507] kthread+0x14c/0x150 [35767.808800] ret_from_fork+0x10/0x1c [35767.809119] ---[ end trace 3c979587923ce4cc ]--- [35767.809653] r8152 4-1:1.0 eth0: Tx timeout [35768.326926] r8152 4-1:1.0 eth0: get_registers -110 [35768.838794] r8152 4-1:1.0 eth0: set_registers -110 [35769.350812] r8152 4-1:1.0 eth0: get_registers -110 [35769.862780] r8152 4-1:1.0 eth0: get_registers -110 [35769.863332] r8152 4-1:1.0 eth0: set_registers -71 [35769.863975] r8152 4-1:1.0 eth0: Tx status -2 [35769.864414] r8152 4-1:1.0 eth0: Tx status -2 [35769.864859] r8152 4-1:1.0 eth0: Tx status -2 [35769.865308] r8152 4-1:1.0 eth0: Tx status -2 [35769.865825] r8152 4-1:1.0 eth0: get_registers -71 [35769.866330] r8152 4-1:1.0 eth0: set_registers -71 [35769.866892] r8152 4-1:1.0 eth0: get_registers -71
系统日志为:
Thu Sep 17 05:46:46 2020 kern.err kernel: [35770.789374] r8152 4-1:1.0 eth0: set_registers -71 Thu Sep 17 05:46:46 2020 kern.err kernel: [35770.789881] r8152 4-1:1.0 eth0: set_registers -71 Thu Sep 17 05:46:46 2020 kern.err kernel: [35770.790398] r8152 4-1:1.0 eth0: get_registers -71 Thu Sep 17 05:46:46 2020 kern.err kernel: [35770.790908] r8152 4-1:1.0 eth0: set_registers -71 Thu Sep 17 05:46:46 2020 kern.info kernel: [35770.930290] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd Thu Sep 17 05:46:46 2020 kern.err kernel: [35770.962362] r8152 4-1:1.0 eth0: Invalid ether addr 00:00:00:00:00:00 Thu Sep 17 05:46:46 2020 kern.info kernel: [35770.962962] r8152 4-1:1.0 eth0: Random ether addr 2a:ca:1a:26:dc:c9 Thu Sep 17 05:46:46 2020 kern.notice kernel: [35770.967381] r8152 4-1:1.0 eth0: Promiscuous mode enabled Thu Sep 17 05:46:46 2020 kern.info kernel: [35771.106295] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd Thu Sep 17 05:46:46 2020 kern.err kernel: [35771.138377] r8152 4-1:1.0 eth0: Invalid ether addr 00:00:00:00:00:00 Thu Sep 17 05:46:46 2020 kern.info kernel: [35771.138987] r8152 4-1:1.0 eth0: Random ether addr 62:9c:d2:dc:55:c2 Thu Sep 17 05:46:46 2020 kern.notice kernel: [35771.143333] r8152 4-1:1.0 eth0: Promiscuous mode enabled
这个错误在friendlywrt上已经被解决了。
呃,我去看了下,貌似也是粗暴的关闭eth0的rx tx offloading关闭了,eth0在 friendlywrt上我记得没有交换wan lan口,所以关闭的应该是wan口的offloading https://github.com/friendlyarm/friendlywrt/blob/f0fc45e9f8bb9b7f3bc4b7ac1f521545d77ffaae/target/linux/rockchip-rk3328/base-files/etc/hotplug.d/iface/12-disable-rk3328-eth-offloading#L25-L29
还是说我没找对
呃,我去看了下,貌似也是粗暴的关闭eth0的rx tx offloading关闭了,eth0在 friendlywrt上我记得没有交换wan lan口,所以关闭的应该是wan口的offloading https://github.com/friendlyarm/friendlywrt/blob/f0fc45e9f8bb9b7f3bc4b7ac1f521545d77ffaae/target/linux/rockchip-rk3328/base-files/etc/hotplug.d/iface/12-disable-rk3328-eth-offloading#L25-L29
还是说我没找对
不是你说的这个提交,commit id是这个吧 2601053cafb4d682a18706621bde206b3a3c7254
呃,我去看了下,貌似也是粗暴的关闭eth0的rx tx offloading关闭了,eth0在 friendlywrt上我记得没有交换wan lan口,所以关闭的应该是wan口的offloading https://github.com/friendlyarm/friendlywrt/blob/f0fc45e9f8bb9b7f3bc4b7ac1f521545d77ffaae/target/linux/rockchip-rk3328/base-files/etc/hotplug.d/iface/12-disable-rk3328-eth-offloading#L25-L29 还是说我没找对
不是你说的这个提交,commit id是这个吧 2601053cafb4d682a18706621bde206b3a3c7254
能否给出一个具体点的链接,谢谢
使用Realtek官网最新的driver好像能解决问题https://github.com/openwrt/openwrt/pull/3178
使用Realtek官网最新的driver好像能解决问题openwrt/openwrt#3178
好吧,忘了说了,就是使用的新版本驱动... 然后,ethtool usb网卡 rx off tx off,几天下来没有问题了
呃,我去看了下,貌似也是粗暴的关闭eth0的rx tx offloading关闭了,eth0在 friendlywrt上我记得没有交换wan lan口,所以关闭的应该是wan口的offloading https://github.com/friendlyarm/friendlywrt/blob/f0fc45e9f8bb9b7f3bc4b7ac1f521545d77ffaae/target/linux/rockchip-rk3328/base-files/etc/hotplug.d/iface/12-disable-rk3328-eth-offloading#L25-L29
还是说我没找对
@aueu eth0 不是板载的网卡(rtl8211e)吗。
@fanck0605 忘记上游哪个commits后,就解决了
@aueu
那现在不需要额外 patch,直接用官方 openwrt 源码就没有问题了吗
刚刚特地做了个 patch 用来关这个 eth0 的 offloading
https://github.com/fanck0605/openwrt/commit/f621b92c22a727180e89f1008ca5a20025025b01
如果没问题我就 Revert 了
我直接官方master分支,挑选一些coolsnowwolf里面的一些patch,没有关闭eth0的offloading
pushd target/linux/rockchip/patches-5.4
wget https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/patches-5.4/002-rockchip-add-hwmon-support-for-SoCs-and-GPUs.patch
wget https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/patches-5.4/003-arm64-dts-rockchip-add-more-cpu-operating-points-for.patch
wget https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/patches-5.4/005-arm64-dts-rockchip-Add-RK3328-idle-state.patch
wget https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/patches-5.4/104-rockchip-rk3328-add-i2c0-controller-for-nanopi-r2s.patch
wget https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/patches-5.4/105-char-add-support-for-rockchip-hardware-random-number.patch
wget https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/patches-5.4/106-arm64-dts-rockchip-add-hardware-random-number-genera.patch
popd
mkdir -p target/linux/rockchip/files/drivers/char/hw_random
wget -P target/linux/rockchip/files/drivers/char/hw_random/ https://raw.githubusercontent.com/coolsnowwolf/lede/master/target/linux/rockchip/files/drivers/char/hw_random/rockchip-rng.c
# model name patch for aarch64
wget -P target/linux/generic/hack-5.4/ https://raw.githubusercontent.com/immortalwrt/immortalwrt/master/target/linux/generic/hack-5.4/312-arm64-cpuinfo-Add-model-name-in-proc-cpuinfo-for-64bit-ta.patch
默默的问一下,你的固件openwrt内置的sysupgrade升级能用吗。 整了半天找不到原因,也搞不到日志。挺头疼的。
一直再用squashfs...也是可以用的
终于找到原因了,特地接了ttl调试串口看日志。估计gzip要背锅,给openwrt反馈了 卸载gzip解决,
root@OpenWrt:/# sysupgrade -v -n /tmp/openwrt-rockchip-armv8-friendlyarm_nanopi-
r2s-squashfs-sysupgrade.img.gz
Thu Jan 21 17:10:37 CST 2016 upgrade: Reading partition table from bootdisk...
gzip: stdout: Broken pipe
Thu Jan 21 17:10:37 CST 2016 upgrade: Reading partition table from image...
Thu Jan 21 17:10:37 CST 2016 upgrade: Commencing upgrade. Closing all shell sessions.
killall: telnetd: no process killed
Thu Jan 21 17:10:38 CST 2016 upgrade: Sending TERM to remaining processes ... uhttpd vsftpd dbus-daemon avahi-daemon thd miniupnpd dnsmasq ubusd crond urngd smbd nmbd ntpd netdata netdata ttyd wsdd2 logd rpcd netifd odhcpd
Thu Jan 21 17:10:41 CST 2016 upgrade: Sending KILL to remaining processes ...
[ 105.137201] sh (4300): drop_caches: 3
Thu Jan 21 17:10:43 CST 2016 upgrade: Switching to ramdisk...
[ 106.938611] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
Thu Jan 21 09:10:44 UTC 2016 upgrade: Performing system upgrade...
Thu Jan 21 09:10:45 UTC 2016 upgrade: Reading partition table from bootdisk...
/bin/zcat: exec: line 51: gzip: not found
0+0 records in
0+0 records out
Thu Jan 21 09:10:45 UTC 2016 upgrade: Reading partition table from image...
Thu Jan 21 09:10:45 UTC 2016 upgrade: Invalid partition table on /tmp/image.bs
[ 107.231703] reboot: Restarting system
U-Boot TPL 2021.01 (Feb 25 2021 - 14:13:13)
用的最新上游源码 这问题eth1 get registers -71 错误又出现了
终于找到原因了,特地接了ttl调试串口看日志。估计gzip要背锅,给openwrt反馈了 卸载gzip解决,
root@OpenWrt:/# sysupgrade -v -n /tmp/openwrt-rockchip-armv8-friendlyarm_nanopi- r2s-squashfs-sysupgrade.img.gz Thu Jan 21 17:10:37 CST 2016 upgrade: Reading partition table from bootdisk... gzip: stdout: Broken pipe Thu Jan 21 17:10:37 CST 2016 upgrade: Reading partition table from image... Thu Jan 21 17:10:37 CST 2016 upgrade: Commencing upgrade. Closing all shell sessions. killall: telnetd: no process killed Thu Jan 21 17:10:38 CST 2016 upgrade: Sending TERM to remaining processes ... uhttpd vsftpd dbus-daemon avahi-daemon thd miniupnpd dnsmasq ubusd crond urngd smbd nmbd ntpd netdata netdata ttyd wsdd2 logd rpcd netifd odhcpd Thu Jan 21 17:10:41 CST 2016 upgrade: Sending KILL to remaining processes ... [ 105.137201] sh (4300): drop_caches: 3 Thu Jan 21 17:10:43 CST 2016 upgrade: Switching to ramdisk... [ 106.938611] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null) Thu Jan 21 09:10:44 UTC 2016 upgrade: Performing system upgrade... Thu Jan 21 09:10:45 UTC 2016 upgrade: Reading partition table from bootdisk... /bin/zcat: exec: line 51: gzip: not found 0+0 records in 0+0 records out Thu Jan 21 09:10:45 UTC 2016 upgrade: Reading partition table from image... Thu Jan 21 09:10:45 UTC 2016 upgrade: Invalid partition table on /tmp/image.bs [ 107.231703] reboot: Restarting system U-Boot TPL 2021.01 (Feb 25 2021 - 14:13:13)
我没这个包也复现了
我被这个问题折磨几天了 kernel上游的讨论
https://bugzilla.kernel.org/show_bug.cgi?id=198931
系统日志为: