Open screenshot-zz opened 3 years ago
硬件故障,检查下 pcie 配置
类似问题,而且观察发现我的情况是每个6小时整就会断一次网。我的x86主机是一个板载网口,另外两个网口是pcie的网卡,最初用的是板载的网口作为lan。出现问题后,我就交换了一下网口,把lan换到一个pcie的网卡上。结果还是同样问题.日志如下: Mon Nov 15 10:15:53 2021 user.warn ddns-scripts[16877]: ipv4_preset: Transfer failed - retry 2751/0 in 60 seconds Mon Nov 15 10:16:54 2021 user.err ddns-scripts[16877]: ipv4_preset: GNU Wget Error: '8' Mon Nov 15 10:16:54 2021 user.warn ddns-scripts[16877]: ipv4_preset: Transfer failed - retry 2752/0 in 60 seconds Mon Nov 15 10:17:54 2021 user.err ddns-scripts[16877]: ipv4_preset: GNU Wget Error: '8' Mon Nov 15 10:17:54 2021 user.warn ddns-scripts[16877]: ipv4_preset: Transfer failed - retry 2753/0 in 60 seconds Mon Nov 15 10:47:54 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries Mon Nov 15 11:17:55 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries Mon Nov 15 11:47:55 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries Mon Nov 15 12:17:55 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries Mon Nov 15 12:17:57 2021 kern.info kernel: [173625.670359] r8168: eth2: link down Mon Nov 15 12:17:57 2021 daemon.notice netifd: Network device 'eth2' link is down Mon Nov 15 12:17:57 2021 kern.info kernel: [173625.696397] br-lan: port 1(eth2) entered disabled state Mon Nov 15 12:17:58 2021 daemon.notice netifd: bridge 'br-lan' link is down Mon Nov 15 12:17:58 2021 daemon.notice netifd: Interface 'lan' has link connectivity loss Mon Nov 15 12:18:00 2021 kern.info kernel: [173628.789052] r8168: eth2: link up Mon Nov 15 12:18:00 2021 kern.info kernel: [173628.789299] br-lan: port 1(eth2) entered blocking state Mon Nov 15 12:18:00 2021 daemon.notice netifd: Network device 'eth2' link is up Mon Nov 15 12:18:00 2021 daemon.notice netifd: bridge 'br-lan' link is up Mon Nov 15 12:18:00 2021 daemon.notice netifd: Interface 'lan' has link connectivity Mon Nov 15 12:18:00 2021 kern.info kernel: [173628.789487] br-lan: port 1(eth2) entered forwarding state Mon Nov 15 12:18:10 2021 kern.info kernel: [173639.014369] r8168: eth2: link down Mon Nov 15 12:18:10 2021 kern.info kernel: [173639.040205] br-lan: port 1(eth2) entered disabled state Mon Nov 15 12:18:10 2021 daemon.notice netifd: Network device 'eth2' link is down Mon Nov 15 12:18:11 2021 daemon.notice netifd: bridge 'br-lan' link is down Mon Nov 15 12:18:11 2021 daemon.notice netifd: Interface 'lan' has link connectivity loss Mon Nov 15 12:18:13 2021 kern.info kernel: [173642.132972] r8168: eth2: link up Mon Nov 15 12:18:13 2021 kern.info kernel: [173642.133202] br-lan: port 1(eth2) entered blocking state Mon Nov 15 12:18:13 2021 daemon.notice netifd: Network device 'eth2' link is up Mon Nov 15 12:18:13 2021 daemon.notice netifd: bridge 'br-lan' link is up Mon Nov 15 12:18:13 2021 daemon.notice netifd: Interface 'lan' has link connectivity Mon Nov 15 12:18:13 2021 kern.info kernel: [173642.133390] br-lan: port 1(eth2) entered forwarding state Mon Nov 15 12:48:13 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries Mon Nov 15 13:18:13 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries Mon Nov 15 13:48:14 2021 authpriv.notice beardropper[9115]: loadState() loaded 0 entries
我是配置了mwan的,不知道是否有影响。
最新测试了一下,应该是rtl8168网卡的驱动问题。我是把板载的rtl8168网卡在bios里面禁用后,只用intel的pcie网卡,目前稳定运行24小时。之前是每6小时必断连一次。
HP T620 板载RTL8168,做wan口,近期版本未发现该情况
我的是hp t620 plus,然后接了一个intel 82576 双口pcie网卡。
类似问题 x86_x64,内核版本5.15.31,一个板载千兆网卡(eth1,RTL8111G),一个PCIE网卡,绿联US230,同样RTL8111G芯片,做lan口,端口eth0,内核报错,这是第二还是第三次了,前几次log太短,被开机log冲掉了,没记录
Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282725] eth0: cmd = 0x00 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282725] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282741] eth0: io_base_l = 0x0001 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282741] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283108] eth0: mem_base_h = 0x0000 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283108] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283507] eth0: resv_0x20_h = 0x0000 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283507] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283908] eth0: ilr = 0xff Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283908] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.284564] eth0: esd_flag = 0x048b Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.284564] . Tue Mar 29 10:43:53 2022 kern.info kernel: [57607.333788] br-lan: port 1(eth0) entered disabled state
大概是6天前编译的,自定义编译配置有这些:
CONFIG_TARGET_x86=y CONFIG_TARGET_x86_64=y CONFIG_TARGET_x86_64_DEVICE_generic=y CONFIG_TARGET_ROOTFS_EXT4FS=n CONFIG_TARGET_ROOTFS_SQUASHFS=y CONFIG_VMDK_IMAGES=n CONFIG_TARGET_IMAGES_GZIP=y CONFIG_TARGET_ROOTFS_PARTSIZE=800
CONFIG_PACKAGE_ipv6helper=y CONFIG_PACKAGE_adguardhome=y CONFIG_PACKAGE_luci-app-uhttpd=y CONFIG_PACKAGE_iperf3=y CONFIG_PACKAGE_luci-app-usb-printer=y CONFIG_PACKAGE_luci-app-diskman=y CONFIG_PACKAGE_sfdisk=y CONFIG_PACKAGE_resize2fs=y CONFIG_PACKAGE_sysfsutils=y CONFIG_PACKAGE_luci-app-aria2=y
CONFIG_PACKAGE_luci-app-passwall=y CONFIG_PACKAGE_xray-core=y CONFIG_PACKAGE_xray-geodata=y CONFIG_PACKAGE_xray-plugin=y
CONFIG_PACKAGE_kmod-usb3=y CONFIG_PACKAGE_kmod-mt76x2u=y
CONFIG_PACKAGE_luci-theme-argonne=y CONFIG_PACKAGE_luci-app-argonne-config=y
CONFIG_PACKAGE_luci-app-serverchan=y CONFIG_PACKAGE_luci-app-netdata=y CONFIG_PACKAGE_luci-app-statistics=y CONFIG_PACKAGE_collectd=y CONFIG_PACKAGE_collectd-mod-wireless=y CONFIG_PACKAGE_collectd-mod-ethstat=y CONFIG_PACKAGE_collectd-mod-thermal=y CONFIG_PACKAGE_collectd-mod-uptime=y CONFIG_PACKAGE_collectd-mod-disk=y
CONFIG_PACKAGE_luci-app-vlmcsd=n CONFIG_PACKAGE_luci-app-xlnetacc=n CONFIG_PACKAGE_luci-app-wireguard=n CONFIG_PACKAGE_luci-app-adbyby-plus=n CONFIG_PACKAGE_luci-app-ttyd=n
CONFIG_DEVEL=y CONFIG_CCACHE=y
Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282725] eth0: cmd = 0x00 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282725] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282741] eth0: io_base_l = 0x0001 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.282741] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283108] eth0: mem_base_h = 0x0000 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283108] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283507] eth0: resv_0x20_h = 0x0000 Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283507] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283908] eth0: ilr = 0xff Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.283908] . Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.284564] eth0: esd_flag = 0x048b Tue Mar 29 10:43:53 2022 kern.err kernel: [57607.284564] . Tue Mar 29 10:43:53 2022 kern.info kernel: [57607.333788] br-lan: port 1(eth0) entered disabled state
ssh读了下ip link,lspci,dmesg | grep -i eth:
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0@NONE:
lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) 00:02.0 VGA compatible controller: Intel Corporation 4th Generation Core Processor Family Integrated Graphics Controller (rev 06) 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) 00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05) 00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05) 00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5) 00:1c.1 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 (rev d5) 00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5) 00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05) 00:1f.0 ISA bridge: Intel Corporation H81 Express LPC Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05) 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) 03:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
dmesg | grep -i eth
[ 3.091712] igb: Intel(R) Gigabit Ethernet Network Driver [ 9.988917] 8139too: 8139too Fast Ethernet driver 0.9.28 [ 9.994946] i40e: Intel(R) Ethernet Connection XL710 Network Driver [ 10.066994] iavf: Intel(R) Ethernet Adaptive Virtual Function Network Driver [ 10.068239] Intel(R) 2.5G Ethernet Linux Driver [ 10.084480] r8168 Gigabit Ethernet driver 8.049.02-NAPI loaded [ 10.104665] r8168 Gigabit Ethernet driver 8.049.02-NAPI loaded [ 12.864746] eth0: 0xffffc90000055000, 00:e0:4c:8b:a4:48, IRQ 35 [ 12.919681] eth1: 0xffffc9000005d000, ac:22:0b:50:fd:20, IRQ 36 [ 16.009754] r8168: eth0: link up [ 16.012874] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 16.037767] r8168: eth1: link up [ 16.038096] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
老哥,不知道这个问题你最后解决了没有。我这也是出现这个问题,用一段时间就断。需要重启才行x86的,pcie网卡
我的这个,后来反复尝试,发现是rj45的头有问题,不小心碰一下就直接导致硬件报错,只能重启。找到问题后换了根网线,一切正常且很稳定。或许你也可以从这方面尝试找找问题。
老哥,不知道这个问题你最后解决了没有。我这也是出现这个问题,用一段时间就断。需要重启才行x86的,pcie网卡
我是rtl8111F, 遇到同样的问题, 我去看了一下r8168的源代码, 在最新版的lede中使用的是8.052.01版本, 确实说没有对 RTL8111F的支持, 但是在8.053.00中的说是对RTL PCIE 全部支持了, 我今天晚上试一下, 如果可以的, 提一个pr
https://github.com/mtorromeo/r8168/commit/503086686ea7b08b8b9b323ab52991987dfd9f6a
我的 rtl8125网卡也是遇到相同问题, 只要带宽占满就直接down,百试百灵,浏览网页没问题,下载pve瞬间死给你看
可以试试先把r8168加入/etc/modules.conf的黑名单,rmmod r8168
卸载模组,再安装kmod-phy-realtek r8169-firmware kmod-r8169
,最后载入模组modprobe r8169
。
如果一切正常,在日后的编译过程中可以将上述三个包选入固件。
反馈bug/问题模板,提建议请删除
1.关于你要提交的问题
Q:是否搜索了issue (使用 "x" 选择)
2. 详细叙述
(1) 具体问题
A:运行一段时间断网 包括 pve ikuai 运行久了也断 但是重启就好了 我想我问问是不是我机器网卡 坏了
(2) 路由器型号和固件版本
A:x86
(3) 详细日志
openwrt 断网了就反复提示 A:87098.829933] r8168: eth0: link up [87101.193750] eth0: cmd = 0xff, should be 0x07 [87101.193750] . [87101.193759] eth0: io_base_l = 0xffff, should be 0xe001 [87101.193759] . [87101.194227] eth0: mem_base_l = 0xffff, should be 0x0004 [87101.194227] . [87101.194744] eth0: mem_base_h = 0xffff, should be 0x8130 [87101.194744] . [87101.195266] eth0: resv_0x1c_l = 0xffff, should be 0x0000 [87101.195266] . [87101.195787] eth0: resv_0x1c_h = 0xffff, should be 0x0000 [87101.195787] . [87101.196316] eth0: resv_0x20_l = 0xffff, should be 0x000c [87101.196316] . [87101.196843] eth0: resv_0x20_h = 0xffff, should be 0xa010 [87101.196843] . [87101.197371] eth0: resv_0x24_l = 0xffff, should be 0x0000 [87101.197371] . [87101.197898] eth0: resv_0x24_h = 0xffff, should be 0x0000 [87101.197898] . [87101.198427] eth0: ilr = 0xff, should be 0x0b [87101.198427] . [87101.198957] eth0: resv_0x2c_l = 0xffff, should be 0x10ec [87101.198957] . [87101.199408] eth0: resv_0x2c_h = 0xffff, should be 0x0123 [87101.199408] . [87101.200059] eth0: pci_sn_l = 0xffffffff, should be 0x684ce000 [87101.200059] . [87101.201740] eth0: pci_sn_h = 0xffffffff, should be 0x01000000 [87101.201740] . [87101.203330] eth0: esd_flag = 0x7fff