Closed nielsavonds closed 3 years ago
Thanks @nielsavonds
However, in the fake fw_printenv script, this will return nvbootctrl get-current-slot, essentially toggling between 0 or 1. The delta update then fails to apply because the checksum mismatches.
I would expect fw_printev
and nvbootctrl get-current-slot
to return the current slot you are booting/running from. So for the example in https://hub.mender.io/t/delta-updates-checksum-mismatch/3312/8
where /dev/mmcblk0p11 is the currently active partition
{
"RootfsPartA": "/dev/mmcblk0p1",
"RootfsPartB": "/dev/mmcblk0p11"
}
I would have expected nvbootctrl get-current-slot
to return 1 in this case. So it seems like you've gotten out of sync with your rootfs/boot slot combo. We've had issues with this on uboot based builds, see https://github.com/OE4T/meta-mender-community/pull/8 for instance, but I haven't noticed this on cboot with with https://github.com/OE4T/tegra-demo-distro since we added the test script script at https://github.com/mendersoftware/meta-mender-community/tree/dunfell/meta-mender-tegra/scripts/test.
The script at https://github.com/mendersoftware/meta-mender-community/tree/dunfell/meta-mender-tegra/scripts/test is supposed to catch these issues although I'm not sure of the last time I ran it on the Xavier NX config. It was probably the last time I did a PR to meta-mender-community. See logic at https://github.com/mendersoftware/meta-mender-community/blob/dunfell/meta-mender-tegra/scripts/test/mender_tegra_test.py#L57. Based on what you've written here I'm guessing this script will fail with the combo you are using. Can you confirm?
Also note that the implementation for these scripts is changing from dunfell to later branches as we move to https://github.com/OE4T/tegra-boot-tools. See https://github.com/OE4T/tegra-demo-distro/commit/99c60c6c91bccb5ad855ecf042786ef526e84746 for instance. So we'll want to take this into account with any fix/PR for dunfell. We might want to start with a PR into https://github.com/OE4T/tegra-demo-distro and https://github.com/OE4T/meta-tegra as needed and make sure we have a combination which is working there with the torture test script.
Tagging @madisongh in case he has comments or other suggestions.
It sounds like the delta updater is relying on a sequence where the mender client or the updater itself is using fw_setenv mender_boot_part
to set the new boot partition before the delta updater does its checks. If so, then we need to rework our fake utilities to allow that to work (while, hopefully, not breaking anything else). Right now, the only variable that the fake fw_setenv
does anything about is upgrade_available
, since in the normal (non-delta) sequence, that's the only variable that needed it.
Hi guys, thanks for your quick comments.
I think there's a slight misunderstanding here. The result of nvbootctrl get-current-slot was correct in all cases. My comments in the linked forum discussion came from different test runs, where the last one was running on /dev/mmcblk0p1 and not /dev/mmcblk0p11. So I would expect nvbootctrl get-current-slot to return 0 in that case.
The issue, however, is that fw_printenv mender_boot_part then also returns '0', the number of the slot, instead of '1', the partition number that Mender expects.
Meanwhile, I've been able to get delta upgrades working on our side by overriding the fw_printenv script in our layer. The new script looks like this (where the two variables at the top get replaced using sed commands in do_install):
#!/bin/sh
BOOT_PART_0=@MENDER_ROOTFS_PART_A_NUMBER@
BOOT_PART_1=@MENDER_ROOTFS_PART_B_NUMBER@
quiet=
if [ "$1" = "-n" ]; then
quiet="yes"
shift
if [ -z "$1" ]; then
echo "ERR: missing var name with -n" >&2
exit 1
fi
fi
if [ -z "$1" ]; then
boot_slot=`nvbootctrl get-current-slot`
[[ $boot_slot = 0 ]] && boot_part="$BOOT_PART_0" || boot_part="$BOOT_PART_1"
echo "mender_boot_part=$boot_part"
exit 0
fi
while [ -n "$1" ]; do
case "$1" in
mender_boot_part|mender_boot_part_hex)
boot_slot=`nvbootctrl get-current-slot`
[[ $boot_slot = 0 ]] && boot_part="$BOOT_PART_0" || boot_part="$BOOT_PART_1"
[ -n "$quiet" ] || echo "$1=$boot_part"
;;
mender_uboot_separator)
[ -n "$quiet" ] || echo -n "$1="
echo "something other than just 1"
;;
upgrade_available)
[ -n "$quiet" ] || echo -n "$1="
if [ -e "/var/lib/mender/upgrade_available" ]; then
echo "1"
else
echo "0"
fi
;;
*)
echo "ERR: no such variable: $1" >&2
exit 1
esac
shift
done
This way, fw_printenv mender_boot_part will return either 1 or 11, depending on the active boot slot.
Thanks for explaining @nielsavonds.
The new script looks like this (where the two variables at the top get replaced using sed commands in do_install):
Should the variables be set based on the content of the mender.conf file instead as done in https://github.com/mendersoftware/meta-mender-community/blob/dunfell/meta-mender-tegra/recipes-mender/tegra-state-scripts/files/redundant-boot-install-script#L35? That way the configuration changes will match whatever was used when the device was manufactured, assuming /var/lib/mender
is mapped to the /data
partition.
Thanks Dan, you're right, that would be a better approach. Here's my second attempt. I've also fix mender_boot_part_hex in this one. I've also done some fixes in the other case statements to make sure the "quiet" parameter works properly.
#!/bin/sh
LABELCHARS="AB"
get_bootpart() {
current_slot=`nvbootctrl get-current-slot`
cfglbl="\"RootfsPart${LABELCHARS:$current_slot:1}\""
devnam=`grep -h "$cfglbl:" /etc/mender/mender.conf /var/lib/mender/mender.conf | tr -d '" ,' | grep -o '[0-9]\+$'`
echo "$devnam"
}
quiet=
if [ "$1" = "-n" ]; then
quiet="yes"
shift
if [ -z "$1" ]; then
echo "ERR: missing var name with -n" >&2
exit 1
fi
fi
if [ -z "$1" ]; then
bootpart=`get_bootpart`
echo "mender_boot_part=$bootpart"
exit 0
fi
while [ -n "$1" ]; do
case "$1" in
mender_boot_part)
bootpart=`get_bootpart`
[ -n "$quiet" ] || echo "$1=$bootpart"
;;
mender_boot_part_hex)
bootpart=`get_bootpart`
bootpart_hex=`echo 16o${bootpart}p | dc`
[ -n "$quiet" ] || echo "$1=$bootpart_hex"
;;
mender_uboot_separator)
[ -n "$quiet" ] || echo "$1=something other than just 1"
;;
upgrade_available)
if [ -e "/var/lib/mender/upgrade_available" ]; then
upgrade_available="1"
else
upgrade_available="0"
fi
[ -n "$quiet" ] || echo "$1=$upgrade_available"
;;
*)
echo "ERR: no such variable: $1" >&2
exit 1
esac
shift
done
The issue, however, is that fw_printenv mender_boot_part then also returns '0', the number of the slot, instead of '1', the partition number that Mender expects.
OK, I see. That's certainly simpler to emulate!
Thanks @nielsavonds looks good to me. Could you prepare a PR for this and, if possible, could you check to see if it still works with https://github.com/OE4T/tegra-demo-distro/ dunfell-l4t-r32.4.3
branch and the test script at https://github.com/mendersoftware/meta-mender-community/blob/dunfell/meta-mender-tegra/scripts/test/mender_tegra_test.py#L57 ?
I've created the pull request here: https://github.com/OE4T/meta-mender-community/pull/11 Unfortunately, I'm not able to test with the tegra-demo-distro at this time.
Upstreamed with https://github.com/mendersoftware/meta-mender-community/pull/221
Hi,
We recently started working on a new version of our device based on the Xavier NX module. By default, the meta-tegra layer configures this module to use cboot instead of UBoot.
To make this work with Mender, there is the custom libubootenv-fake scripts that are installed on the rootfs. However, when using delta updates, these are not sufficient. For more context, see https://hub.mender.io/t/delta-updates-checksum-mismatch/3312
Basically, what happens, is that the binary delta installer looks at mender_boot_part to determine which partition it should use as its base for applying the delta update. However, in the fake fw_printenv script, this will return
nvbootctrl get-current-slot
, essentially toggling between 0 or 1. The delta update then fails to apply because the checksum mismatches.Would it be possible to modify this script to return the correct rootfs partition based on the slot? Or do you think this would trigger other issues? If you see no problems with this, I can try to make a pull request.
Thanks in advance, Best regards, Niels Avonds