Closed grische closed 10 months ago
does this also happen with the very similar WDR3600 ?
does this also happen with the very similar WDR3600 ?
Probably. We had a few isolated cases where a WDR3600 needed a power cycle after an upgrade but it is not clear if this is at all related to the problem described here. We don't have enough (failing) devices to have a confident answer.
It might be worth mentioning that the special symbol at the end of the log is printed during a boot as well. I'm not sure if this is printed before or after the bootloader loaded though.
EDIT: I was able to reproduce the hang without the special symbol appearing. As if it got stuck during reboot:
[ 226.555153] br-client: port 4(client0) entered disabled state
Watchdog handover: fd=3
- watchdog -
Watchdog did not previously reset the system
[ 226.596451] device client1 left promiscuous mode
[ 226.601297] br-client: port 5(client1) entered disabled state
Wed Jul 12 21:19:50 CEST 2023 upgrade: Sending TERM to remaining processes ...
Wed Jul 12 21:19:51 CEST 2023 upgrade: Sending signal TERM to sse-multiplexd (2340)
Wed Jul 12 21:19:51 CEST 2023 upgrade: Sending signal TERM to dnsmasq (2782)
Wed Jul 12 21:19:55 CEST 2023 upgrade: Sending KILL to remaining processes ...
[ 237.363401] stage2 (5324): drop_caches: 3
Wed Jul 12 21:20:01 CEST 2023 upgrade: Switching to ramdisk...
mount: mounting /dev/mtdblock4 on /overlay failed: Resource busy
[ 241.391167] VFS: Busy inodes after unmount of jffs2. Self-destruct in 5 seconds. Have a nice day...
Wed Jul 12 19:20:05 UTC 2023 upgrade: Performing system upgrade...
[ 241.489244] do_stage2 (5324): drop_caches: 3
Unlocking firmware ...
Writing from <stdin> to firmware ...
Wed Jul 12 19:20:23 UTC 2023 upgrade: Upgrade completed
Wed Jul 12 19:20:24 UTC 2023 upgrade: Rebooting system...
umount: can't unmount /dev: Resource busy
umount: can't unmount /tmp: Resource [ 261.048575] reboot: Restarting system
We also had reports in our community when I rolled out 2022.1 but thought it was random, and we didn't have proper logs or anything else. #2655
We observed this when transitioning from 2022.1.2 to 2022.1.4 on WDR4300 and more frequently on Ubiquiti AC lite. In our observation, the update was fine when the machine was rebooted just prior to the update, which may be suggesting an out-of-memory issue.
@smoe Just to clarify, we were able to reproduce the issue on a freshly booted device as well. I assume the WDRs and the AC Lite are different issues here.
@grische
One thing that comes to my mind is the usage of the newer ar934x SPI controller driver, at least no device reported in this issue uses the older ar71xx driver.
This driver was first shipped with OpenWrt 21.02, matching the observation it does not break with older releases based on OpenWrt 19.07 and older.
https://github.com/openwrt/openwrt/commit/ebf0d8dadeca443121f4f597c51bf6591e341caf
If you are still able to reproduce this issue, you can modify the ar934x DTSI to use the compatible for the ar71xx SPI controller. Ping me in case i should provide you with a patch. If this fixes the reboot issue, we have a better path where to look next.
@blocktrron thank you for looking into this. To avoid misunderstandings, you suggest to do this change here in OpenWRT?
diff --git a/target/linux/ath79/dts/ar934x.dtsi b/target/linux/ath79/dts/ar934x.dtsi
index d88c7bfabc..15201b197e 100644
--- a/target/linux/ath79/dts/ar934x.dtsi
+++ b/target/linux/ath79/dts/ar934x.dtsi
@@ -199,15 +199,17 @@
};
spi: spi@1f000000 {
- compatible = "qca,ar934x-spi";
- reg = <0x1f000000 0x1c>;
+ compatible = "qca,ar7240-spi",
+ "qca,ar7100-spi";
+ reg = <0x1f000000 0x10>;
clocks = <&pll ATH79_CLK_AHB>;
+ clock-names = "ahb";
+
+ status = "disabled";
#address-cells = <1>;
#size-cells = <0>;
-
- status = "disabled";
};
};
@grische Almost. Just revert this commit in the file:
@blocktrron I was able to reproduce a hang after reboot even with the above commit reverted using Gluon v2023.1: https://gist.github.com/grische/27e4e780530f9a0795d96afaf749a4ed
Here is the respective branch: https://github.com/grische/site-ffm/commits/test/revert-ath79-add-new-ar934x-spi-driver/
@grische Are these hangs only reproducible after writing a upgrade image or does a regular reboot invocation also trigger a spurious hang?
I have a test WDR4300 device where I can reproduce the hangs during a reboot every other time. Surprisingly often actually. This device has a manually installed serial port and a serial cable attached to the port.
On the exact same setup, I tested it with
Bug report
What is the problem? Occasionally (>10% of all devices), hang after an autoupdate and need a manual powercycle to reboot.
I managed to reproduce this while a serial cable was attached:
I am not sure if this is related to #185, but we were not able to reproduce it (yet) with a reboot.
What is the expected behaviour? That the WDR4300 comes back up after an update.
Gluon Version: v2022.1.2 and v2022.1.3 Probably also earlier v2022.x
We experienced similar behaviour during the initial v2022.1 deployment, but discarded it as "random". It was more severe with the v2022.1.3 deployment (probably just because of chance) and I was able to reproduce it with a serial cable attached when upgrading from v2022.1.3 to v2022.1.4.
Site Configuration: https://github.com/freifunkMUC/site-ffm/blob/833829e68f97e4781f175bdd688d7f498a7efe53/site.conf
Custom patches: https://github.com/freifunkMUC/site-ffm/tree/833829e68f97e4781f175bdd688d7f498a7efe53/patches