OE4T / tegra-demo-distro

Reference/demonstration distro for meta-tegra
MIT License
71 stars 73 forks source link

Mender update fails on Xavier NX #52

Closed brgl closed 3 years ago

brgl commented 3 years ago

Hi!

I'm trying to integrate mender with our system but I can't correctly update the device. I first noticed this error on a custom system but I can reproduce it with the vanilla demo-image-base image with hosted mender.

The update fails when the ArtifactInstall_Leave_80_bl-update state script runs and exits with an error. The offending line is:

if ! chroot "${mnt}" /usr/sbin/nv_update_engine --install no-reboot; then

And this is what happens inside:

5399  openat(AT_FDCWD, "/opt/ota_package/bl_update_payload", O_RDONLY) = 3
5399  fstat(3, {st_mode=S_IFREG|0644, st_size=47626452, ...}) = 0
5399  read(3, "NVIDIA__BLOB__V2\0\0\2\0\324\270\326\0020\0\0\0\33\0\0\0\0\0\0\0\324\270\326\2\0\0\0\0\0\0\0\0spe-fw\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\330\f\0\0\360r\1\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0mb2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\310\177\1\0000\232\2\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0cpu-bootloader\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\370\31\4\0\200\260\6\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0secure-os\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0x\312\n\0\300\263\5\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
5399  write(1, "HEADER: MAGIC NVIDIA__BLOB__V2\n", 31) = 31
5399  write(1, "HEX_VALUE 131072\n", 17) = 17
5399  write(1, "BLOB_SIZE 47626452\n", 19) = 19
5399  write(1, "HEADER_SIZE 48\n", 15)  = 15
5399  write(1, "NUMBER_OF_ELEMENTS 27\n", 22) = 22
5399  write(1, "HEADER_TYPE 0\n", 14)   = 14
5399  write(1, "UNCOMP_SIZE 47626452\n", 21) = 21
5399  write(1, "MB1_RATCHET_LV 0\n", 17) = 17
5399  write(1, "MTS_RATCHET_LV 0\n", 17) = 17
5399  write(1, "ROLLBACK_FUSE_LV 0\n", 19) = 19
5399  lseek(3, 0, SEEK_SET)             = 0
5399  read(3, "NVIDIA__BLOB__V2\0\0\2\0\324\270\326\0020\0\0\0\33\0\0\0\0\0\0\0\324\270\326\2\0\0\0\0\0\0\0\0spe-fw\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\330\f\0\0\360r\1\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0mb2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\310\177\1\0000\232\2\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0cpu-bootloader\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\370\31\4\0\200\260\6\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0secure-os\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0x\312\n\0\300\263\5\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
5399  openat(AT_FDCWD, "/etc/nv_boot_control.conf", O_RDONLY) = 4
5399  fstat(4, {st_mode=S_IFREG|0644, st_size=154, ...}) = 0
5399  read(4, "TNSPEC 3668-300-0000-B.0-1-2-jetson-xavier-nx-devkit-mmcblk0p1\nTEGRA_CHIPID 0x19\nTEGRA_OTA_BOOT_DEVICE /dev/mtdblock0\nTEGRA_OTA_GPT_DEVICE /dev/mtdblock0\n", 4096) = 154
5399  read(4, "", 4096)                 = 0
5399  close(4)                          = 0
5399  openat(AT_FDCWD, "/etc/nv_boot_control.conf", O_RDONLY) = 4
5399  fstat(4, {st_mode=S_IFREG|0644, st_size=154, ...}) = 0
5399  read(4, "TNSPEC 3668-300-0000-B.0-1-2-jetson-xavier-nx-devkit-mmcblk0p1\nTEGRA_CHIPID 0x19\nTEGRA_OTA_BOOT_DEVICE /dev/mtdblock0\nTEGRA_OTA_GPT_DEVICE /dev/mtdblock0\n", 4096) = 154
5399  close(4)                          = 0
5399  write(1, "config COMPATIBLE_SPEC not found in /etc/nv_boot_control.conf\nDevice TN Spec: 3668-300-0000-B.0-1-2-jetson-xavier-nx-devkit-mmcblk0p1\n", 134) = 134
5399  write(1, "Can't find matching TN Spec in OTA Blob!\n", 41) = 41
5399  close(3)                          = 0
5399  write(1, "OTA Blob update failed. Status: 3\n", 34) = 34
5399  write(1, "/usr/sbin/nv_bootloader_payload_updater --no-dependent-partition failed.\n", 73) = 73
5399  exit_group(3)     

Unfortunately I have no idea what this program is trying to achieve. Please advise on where to look next.

EDIT

This is the complete strace output.

ichergui commented 3 years ago

Hey @brgl Could you please give us more detail about the repo are you using ?Are you using tegra-demo-distro ? which branch ? I think that I have tested the A/B system update and it was working fine. so please share the detail and I will take a look and get back to you quickly.

brgl commented 3 years ago

@ichergui Duh, sorry for not posting that right away. Below is my build info:

Build Configuration:
BB_VERSION           = "1.46.0"
BUILD_SYS            = "x86_64-linux"
NATIVELSBSTRING      = "universal"
TARGET_SYS           = "aarch64-oe4t-linux"
MACHINE              = "jetson-xavier-nx-devkit"
DISTRO               = "tegrademo-mender"
DISTRO_VERSION       = "3.1+snapshot"
TUNE_FEATURES        = "aarch64 armv8a crc"
TARGET_FPU           = ""
meta                 = "HEAD:bb7747497adbc7c99f6fc9b48b643eecb4cb1408"
meta-tegra           
contrib              = "HEAD:4dffc9c71438d9dc78a29f0b2444fb949419a9ef"
meta-oe              
meta-python          
meta-networking      
meta-filesystems     = "dunfell:5bba79488b7d393d2258d6e917f7bf7b0d7c4073"
meta-virtualization  = "dunfell:92cd3467502bd27b98a76862ca6525ce425a8479"
meta-mender-core     = "dunfell:f5864bfb32906b87cc24d5b84e74805994f0ef3e"
meta-mender-tegra    = "dunfell:43b69147328ec501e902dd0832eb286bbe501c15"
meta-tegra-support   
meta-demo-ci         
meta-tegrademo       = "dunfell-l4t-r32.4.3:fe5af105aed8867b41421d59bafd330ed174c13e"

It's dunfell branch of tegra-demo-distro and I'm using the demo-image-base image. Let me know if you need more info.

ichergui commented 3 years ago

@brgl no worries I will try it and let you know.

madisongh commented 3 years ago

The TNSPEC that was mentioned in the trace was 3668-300-0000-B.0-1-2-jetson-xavier-nx-devkit-mmcblk0p1. That spec is derived from module version information stored in an EEPROM and is used to determine which flavors of the various bootloader binaries are compatible with the machine. On some Jetsons, at least, different module versions require different versions of some the binaries.

Looks like NVIDIA has issued a new version (aka FAB) of the NX dev kit module (that's the 300 above). Current BUP payload builds only handle the 100 and 200 FABs (and even the 200 one I ended up adding myself because the l4t_generate_soc_bup.sh script in the BSP only mentions 100, even in R32.5.0).

Sure enough, I just fired up a new NX dev kit I just got, and it too is FAB 300. I'll update the machine config to add that one to the list. In the meantime, you should be able to work around the issue by adding

TEGRA_BUPGEN_SPECS_append_jetson-xavier-nx-devkit = " fab=300;boardsku=0000;boardrev="

to your local.conf file. It looks like the current bootloaders/configs work OK with the new version, so that should be safe to do.

ichergui commented 3 years ago

Thanks @madisongh That's right. I just finish testing here, all good.

madisongh commented 3 years ago

OK, I've merged the change into the relevant branches in meta-tegra and have updated the corresponding branches of the distro with the latest, so next time you update (and git submodule update to update the layers), it should work better.

brgl commented 3 years ago

Thanks a lot @madisongh, everything works now!

arrow53 commented 3 years ago

@madisongh i'm on gategarth with a custom image but I'm wondering if this is the same as #42 ? Would this fix impact the gatesgarth branch as I'm still having problems getting updates to stick.

madisongh commented 3 years ago

@arrow53 The symptom here is a failure during the mender install, consistently on a device with the newer version. In #42 it sounded like you were saying that the install completes without error, but the device doesn't reboot into the new partition, some of the time.

arrow53 commented 3 years ago

@madisongh ok got it. Yeah, I'm still having issues. I'll go back to to #42 if I can get some insight what to try/show. thanks.