aparcar / openwrt

Staging tree of Paul Spooren
Other
9 stars 1 forks source link

FS#1328 - VGV7510KW22 / o2 Box 6431: Spurious reboots and ethernet link loss #1142

Open aparcar opened 6 years ago

aparcar commented 6 years ago

hailfinger:

Device: VGV7510KW22 / o2 Box 6431

Software version: OpenWrt SNAPSHOT r5944-ad4232e Linux version 4.9.77 (buildbot@slashdirt-03) (gcc version 5.5.0 (OpenWrt GCC 5.5.0 r5944-ad4232e) ) #0 SMP Thu Jan 25 10:00:48 2018

Environment: Additional packages: asterisk13-chan-lantiq, strongswan

Steps to reproduce: From time to time, the box will reboot spuriously. Uptimes vary, between 12 hours and 2 weeks. After such a reboot, some of the ethernet ports do not detect a link anymore. Unplugging and replugging the ethernet cable into the same port does not help, and the kernel will only notice a link if I plug the ethernet cable into another port. Sometimes, the ethernet port loses link even without a reboot. Syslog is logged to an external server via UDP and does not show any error message prior to the spurious reboot.

Workaround: Performing a manual reboot after the spurious reboot fixes the ethernet link loss.

Possibly relevant difference between dmesg on a spurious and normal reboot: --- dmesg-normal-reboot.txt.notimestamps 2018-02-05 11:42:10.656203272 +0100 +++ dmesg-spurious-reboot.txt.notimestamps 2018-02-05 11:40:47.264201277 +0100 @@ -30,8 +30,8 @@ PID hash table entries: 256 (order: -2, 1024 bytes) Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) -Writing ErrCtl register=00014098 -Readback ErrCtl register=00014098 +Writing ErrCtl register=00014090 +Readback ErrCtl register=00014090 Memory: 55392K/63488K available (4334K kernel code, 178K rwdata, 1324K rodata, 1252K init, 244K bss, 8096K reserved, 0K cma-reserved) SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 Hierarchical RCU implementation. @@ -123,12 +123,14 @@ libphy: Fixed MDIO Bus: probed libphy: lantiq,xrx200-mdio: probed ICPlus IP101A/G 0:01: attached PHY driver [ICPlus IP101A/G] (mii_bus:phy_addr=0:01, irq=-1) -Intel XWAY PHY22F (xRX integrated) 0:11: attached PHY driver [Intel XWAY PHY22F (xRX integrated)] (mii_bus:phy_addr=0:11, irq=-1) -Intel XWAY PHY22F (xRX integrated) 0:12: attached PHY driver [Intel XWAY PHY22F (xRX integrated)] (mii_bus:phy_addr=0:12, irq=-1) -random: fast init done +lantiq,xrx200-net 1e108000.eth eth0 (uninitialized): no PHY found +xrx200-mdio: probing phy of port 2 failed +lantiq,xrx200-net 1e108000.eth eth0 (uninitialized): no PHY found +xrx200-mdio: probing phy of port 3 failed Intel XWAY PHY22F (xRX integrated) 0:13: attached PHY driver [Intel XWAY PHY22F (xRX integrated)] (mii_bus:phy_addr=0:13, irq=-1) +random: fast init done Intel XWAY PHY22F (xRX integrated) 0:14: attached PHY driver [Intel XWAY PHY22F (xRX integrated)] (mii_bus:phy_addr=0:14, irq=-1) -ltq-cputemp cputemp@0: Current CPU die temperature: 54.0 °C +ltq-cputemp cputemp@0: Current CPU die temperature: 52.5 °C wdt 1f8803f0.watchdog: Init done NET: Registered protocol family 10 NET: Registered protocol family 17 @@ -139,7 +141,6 @@ This architecture does not have kernel memory protection. init: Console is alive init: - watchdog - -lantiq,xrx200-net 1e108000.eth eth0: port 2 got link kmodloader: loading kernel modules from /etc/modules-boot.d/ dwc2 1e101000.ifxhcd: requested GPIO 509 dwc2 1e101000.ifxhcd: DWC OTG Controller @@ -151,12 +152,12 @@ hub 1-0:1.0: 1 port detected kmodloader: done loading kernel modules from /etc/modules-boot.d/ init: - preinit - -jffs2: notice: (440) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found. +IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready +jffs2: notice: (436) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found. mount_root: switching to jffs2 overlay urandom-seed: Seeding with /etc/urandom.seed procd: - early - procd: - watchdog - -lantiq,xrx200-net 1e108000.eth eth0: port 2 lost link procd: - watchdog - procd: - ubus - procd: - init - @@ -207,19 +208,19 @@ device eth0 entered promiscuous mode IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready IPv6: ADDRCONF(NETDEV_UP): eth0.13: link is not ready -lantiq,xrx200-net 1e108000.eth eth0: port 2 got link -IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready -br-lan: port 1(eth0.1) entered blocking state -br-lan: port 1(eth0.1) entered forwarding state -IPv6: ADDRCONF(NETDEV_CHANGE): eth0.13: link becomes ready -IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready +random: crng init done PTM 1.0.27 PTM (E1) firmware version 0.30 ifxmips_ptm: PTM init succeed IPv6: ADDRCONF(NETDEV_UP): dsl0: link is not ready IPv6: ADDRCONF(NETDEV_UP): dsl0.7: link is not ready -random: crng init done enter showtime IPv6: ADDRCONF(NETDEV_CHANGE): dsl0: link becomes ready IPv6: ADDRCONF(NETDEV_CHANGE): dsl0.7: link becomes ready enter showtime pppoe-wan: renamed from ppp0

aparcar commented 6 years ago

hailfinger:

dmesg from a normal and a spurious reboot

aparcar commented 6 years ago

hailfinger:

This might be related to the link loss: https://forum.openwrt.org/viewtopic.php?id=63225

aparcar commented 6 years ago

hailfinger:

Additional info: The bootloader is U-Boot, but a roughly 12 month old version from LEDE trunk: U-Boot 2013.10-openwrt4 (Feb 16 2017 - 12:32:56) VGV7510KW22 Board: Arcadyan VGV7510KW22 SoC: Lantiq VRX288 v1.2