freifunk-gluon / gluon

a modular framework for creating OpenWrt-based firmwares for wireless mesh nodes
https://gluon.readthedocs.io
Other
543 stars 324 forks source link

Update scripts don't run after sysupgrade/autoupdater on NanoPi R2S #2318

Closed goligo closed 2 years ago

goligo commented 2 years ago

Bug report

What is the problem? We noticed that one of our update scripts included in a new firmware version was not executed on NanoPi R2S. On closer look it appears that update scripts are not called at all on NanoPi R2S after sysupgrade. A manual call of gluon-reconfigure is required to trigger them.

As far I understand update scripts are triggered after upgrade with a file called /etc/uci-defaults/zzz-gluon-upgrade . When creating this file manually and rebooting, scripts are executed as expected, so this mechanism seems to work. So it appears that this file is not updated/created properly, when updating the NanoPi R2S?

What is the expected behaviour? Update scripts should be exectuted after sysupgrade

Gluon Version: master

Site Configuration: https://github.com/freifunkMUC/site-ffm/tree/next

Custom patches: https://github.com/freifunkMUC/site-ffm/tree/next/patches

goligo commented 2 years ago

To be specific, our update script is about synchronizing gluon.mesh_vpn.enabled and wireguard.mesh_vpn.enabled, we have noticed that the first one was disabled for some nodes, due to historic misconfiguration, which was causing disabling the second one as well, so nodes lost their VPN-connection after updating.

https://github.com/freifunkMUC/community-packages/blob/0e71213b5af9be482f1ab82e2d3cb62d1d43bafa/ffmuc-gluon-mesh-vpn-wireguard-vxlan/luasrc/lib/gluon/upgrade/400-mesh-vpn-wireguard#L22

This was working as expected on all nodes, except for the NanoPi R2S.

neocturne commented 2 years ago

How was the upgrade installed? Autoupdater or manual sysupgrade? If manual, what exact command?

goligo commented 2 years ago

sysupgrade as well as autoupdater show the same behaviour. Exact command for sysupgrade was:

sysupgrade https://firmware.ffmuc.net/v2021.8.1-next3/sysupgrade/gluon-ffmuc-v2021.8.1-next3-friendlyelec-nanopi-r2s-sysupgrade.img.gz
neocturne commented 2 years ago

Please provide the output of logread and dmesg for the first boot after an upgrade.

Weirdly, I'm having trouble to build that target for the current Gluon master due to an obscure kernel build error... I'll try downgrading OpenWrt to the version before the last kernel bump.

goligo commented 2 years ago

dmesg.txt logread.txt

neocturne commented 2 years ago

The issue is that the overlay is not reset during the upgrade, but I'm not sure about the reason...

On x86 I see a message "mount_root: rootdisk overlay filesystem has not been formatted yet" after "init: - preinit -". In your log a lot more is happening - in particular modules are first loaded from /etc/modules-boot.d/* and then from /tmp/overlay/upper/etc/modules-boot.d/*. Were any packages installed on the system using opkg?

There are also a few lines which I believe are from the "block-mount" tool:

block: attempting to load /etc/config/fstab
block: unable to load configuration (fstab: Entry not found)
block: no usable configuration

Is this package installed on the device? If it is, how? I don't think it is a default package of that target, but I might be wrong.

blocktrron commented 2 years ago

There looks to be an issue with the squashfs images, they are considerably smaller than the ext4 images.

-rw-r--r-- 1 dbauer users  67M Sep 23 20:50 openwrt-21.02.0-rockchip-armv8-friendlyarm_nanopi-r2s-squashfs-sysupgrade.img.gz
-rw-r--r-- 1 dbauer users 168M Sep 23 20:50 openwrt-21.02.0-rockchip-armv8-friendlyarm_nanopi-r2s-ext4-sysupgrade.img.gz

Don't be fooled by the .gz extension, they are uncompressed and do not have metadata appended.

Because of this, the squashfs seems to be written although the partition on the blockdevice is not entirely overwritten, resulting in the overlayfs (ext4 / f2fs) still existing after the upgrade.

blocktrron commented 2 years ago

It looks like PADDING=1 has to be set for the sdcard image generation to extend the input rootfs to the size of the actual partition.

--- a/target/linux/rockchip/image/Makefile
+++ b/target/linux/rockchip/image/Makefile
@@ -34,7 +34,7 @@ define Build/pine64-img
    # http://opensource.rock-chips.com/wiki_Boot_option#Boot_flow
    #
    # U-Boot SPL expects the U-Boot ITB to be located at sector 0x4000 (8 MiB) on the MMC storage
-   $(SCRIPT_DIR)/gen_image_generic.sh \
+   PADDING=1 $(SCRIPT_DIR)/gen_image_generic.sh \
        $@ \
        $(CONFIG_TARGET_KERNEL_PARTSIZE) $@.boot \
        $(CONFIG_TARGET_ROOTFS_PARTSIZE) $(IMAGE_ROOTFS) \
goligo commented 2 years ago

Were any packages installed on the system using opkg?

Maybe I did install iperf3, to do some performance tests - that is all I can think of.

blocktrron commented 2 years ago

@goligo Please try https://github.com/freifunk-gluon/gluon/pull/2319 it should resolve your issue.

goligo commented 2 years ago

I have built an image with the patch provided above and I can confirm it fixes the issue.