freifunk-gluon / gluon

a modular framework for creating OpenWrt-based firmwares for wireless mesh nodes
https://gluon.readthedocs.io
Other
553 stars 325 forks source link

Ubiquiti UniFi AC Mesh always reboots into clean configuration mode, no data preserved (Reason: Insufficient space for rootfs_data partition after migration to ath79 target) #2473

Closed dzzinstant closed 2 months ago

dzzinstant commented 2 years ago

Bug report

What is the problem? My node runs on the Freifunk Darmstadt variant of gluon, 'testing' branch. When the branch was recently updated from ffda 2.3\~20210608 / gluon-v2020.2-263-g3f59fdc6 to ffda 2.5\~20220330 / gluon-v2021.1-338-g55da2a7, my node no longer responded, and apparently was stuck in config mode.

Other updating methods led to the same result:

I configured the node using the web interface. After pressing the "save & reboot" button, the node always reboots to config mode. All configuration data - including the node name - is deleted/reset to first installation state. This problem only appeared on a UniFi AC Mesh, my other devices accepted the rollout of 2.5.* without noticable problems.

What is the expected behaviour? Using the stable branch ffda 2.4.1 / gluon-v2021.1.1, the node works & reboots without problems. All configuration settings are preserved.

Node in question: 64287-gnat, older data of the same device: 64287-gnat (old)

Another node in the ffda network went down at about the same time, also a Unifi AC Mesh device on the 'testing' branch.

Possibly related issues

neocturne commented 2 years ago

Please provide the output of the "logread" command after booting into config mode of an affected image.

@AiyionPrime Do you still have access to the working UniFi AC Mesh that you tested? Can you get a boot log from that device as well?

mweinelt commented 2 years ago

Can confirm we lost the AC-Mesh in our hackspace as well. The date it went missing coincides with being readded to the ath79-generic target.

https://meshviewer.darmstadt.freifunk.net/#!/en/map/f09fc2dec4c5

Another node in the ffda network went down at about the same time, also a Unifi AC Mesh device on the 'testing' branch.

Oh yeah, that's the one. Hi there! Thanks for reporting this issue.

mweinelt commented 2 years ago

Grabbed logread and this one is sitting in config mode as well.

https://gist.github.com/mweinelt/35d85f5803573396e805887ef72c367c

neocturne commented 2 years ago

Grabbed logread and this one is sitting in config mode as well.

https://gist.github.com/mweinelt/35d85f5803573396e805887ef72c367c

Thanks. It seems that OpenWrt is missing an image size check - there are only 3 64K block free for the overlay.

dzzinstant commented 2 years ago

from 64287-gnat: logread-unifi_ac_mesh-reverts_to_first_boot.log

mweinelt commented 2 years ago

@dzzinstant Todays testing firmware reduces the number of packages we install on the device, which works around this problem.

What is interesting to notice is, that with the new target(?) the nodes set different primary MAC addresses.

https://meshviewer.darmstadt.freifunk.net/#!/788a20f21ff6 https://meshviewer.darmstadt.freifunk.net/#!/788a20f01ff6

I think we have to recheck the MAC address on the device, if any.

dzzinstant commented 2 years ago

@dzzinstant Todays testing firmware reduces the number of packages we install on the device, which works around this problem.

Works on my node. Thanks a lot!

What is interesting to notice is, that with the new target(?) the nodes set different primary MAC addresses.

https://meshviewer.darmstadt.freifunk.net/#!/788a20f21ff6 https://meshviewer.darmstadt.freifunk.net/#!/788a20f01ff6

I think we have to recheck the MAC address on the device, if any.

I guess that only became apparent because I went back to the original firmware. It might also be caused by a change in gluon/openwrt that happened a long time ago.

mweinelt commented 2 years ago

Not sure how well accessible yours is, but could you check what MAC address is on the label inside the lower compartment, where the LAN cable is connected? We need to get this right.

dzzinstant commented 2 years ago

MAC address on the device label: 788A20F01FF6 (matching the new address)

dzzinstant commented 2 years ago

Just leaving a note, I guess the problem might reappear for other communities or devices.

Summary

rotanid commented 2 years ago

@dzzinstant please retry with the current master branch, optimizations for the needed flash size were merged in the last few days. in my case this helped. see #2501

blocktrron commented 2 years ago

We've reduced the site a while ago by excluding USB packages from the boards in question. I'll close this issue.

neocturne commented 2 years ago

At noted by @dzzinstant, there should be an image size check in place. Reopening to keep track of fixing the check.

dzzinstant commented 2 years ago

Sorry about the confusion, I also wasn't sure about whether this ticket should remain open. I have submitted a more specific bug report here: https://github.com/openwrt/openwrt/issues/9862

IMHO this ticket should remain open for now, because also gluon's functionality is affected. In particular, the bug may break the autoupdating procedure.

mweinelt commented 2 years ago

There have been unanswered questions upstream: https://github.com/openwrt/openwrt/issues/9862#issuecomment-1125833592

Can someone take a look?

yunhai20082008 commented 2 years ago

Bug report

What is the problem? My node runs on the Freifunk Darmstadt variant of gluon, 'testing' branch. When the branch was recently updated from ffda 2.3~20210608 / gluon-v2020.2-263-g3f59fdc6 to ffda 2.5~20220330 / gluon-v2021.1-338-g55da2a7, my node no longer responded, and apparently was stuck in config mode.

Other updating methods led to the same result:

  • revert to Ubiquiti's stock firmware (both 2017-05-08 and 2022-03-12), then install 2.5.* using dd
  • revert to stock firmware, install 2.4.1 (stable firmware from ffda). Then install 2.5.* (using both autoupdater and sysupgrade -n)
  • (I set the first byte of the MTD partition bs to 0x00, I also tried with setting it to 0x01)

I configured the node using the web interface. After pressing the "save & reboot" button, the node always reboots to config mode. All configuration data - including the node name - is deleted/reset to first installation state. This problem only appeared on a UniFi AC Mesh, my other devices accepted the rollout of 2.5.* without noticable problems.

What is the expected behaviour? Using the stable branch ffda 2.4.1 / gluon-v2021.1.1, the node works & reboots without problems. All configuration settings are preserved.

Node in question: 64287-gnat, older data of the same device: 64287-gnat (old)

Another node in the ffda network went down at about the same time, also a Unifi AC Mesh device on the 'testing' branch.

Possibly related issues

could you do me a favor about this?https://forum.openwrt.org/t/need-a-copy-content-of-full-flash-about-uap-ac-m/140438

Djfe commented 1 year ago

I think this was solved (openwrt requires 3 erase blocks to be available for rootfs on build afaik, which is 192kb), also we got a patch in master now that doubles the space used by Gluon https://github.com/freifunk-gluon/gluon/commit/cc854594b0ba677760b844f5e92f411658ba13d8

dzzinstant commented 2 months ago

Apparently, this problem did not reappear for the UniFi AC Mesh. To solve the problem in general (for all devices), it would be possible to reserve a minimum space needed for erase blocks during the image building process, e.g. in the targets' individual Makefiles openwrt/target/linux/<..>/image/<..>.mk or by modifying the check-size command in openwrt/include/image-commands.mk.

Both options seem somewhat arbitrary and disproportionate to me (because the bug only appears very rarely and can easily be cured), and might also introduce new obscure bugs. So I am rather closing the corresponding bug reports for gluon and openwrt.

Djfe commented 2 months ago

if it's only related to free erase blocks, then I ran into the same issue with an openmesh device. They have lots of flash. but use A/B partitioning and the jffs2 requires 1.25MiB free erase blocks which is quite a lot. (erase size is 256k)

I don't think this is quite done. but we will see