OE4T / meta-tegra

BSP layer for NVIDIA Jetson platforms, based on L4T
MIT License
385 stars 216 forks source link

REDUNDANT_FLASH with jetson-xavier-nx-devkit-emmc, nvme and custom rootfs size #1439

Closed aurelien-enchanted-tools closed 5 months ago

aurelien-enchanted-tools commented 7 months ago

Hello,

branch mickledore

I obtain a bug when I use a custom ROOTFSPART_SIZE or custom ROOTFSPART_SIZE_DEFAULT in my MACHINE conf file.

require conf/machine/jetson-xavier-nx-devkit-emmc.conf
MACHINEOVERRIDES = "cuda:tegra:tegra194:xavier-nx:jetson-xavier-nx-devkit-emmc:${MACHINE}"
PACKAGE_EXTRA_ARCHS:append = " jetson-xavier-nx-devkit-emmc"
USE_REDUNDANT_FLASH_LAYOUT = "1"
TNSPEC_BOOTDEV = "nvme0n1p1"
TEGRA_EXTERNAL_DEVICE_SECTORS = "488397168"

=> tegra-minimal-initramfs-1.0-r0 do_image_cpio builds correctly.

require conf/machine/jetson-xavier-nx-devkit-emmc.conf
MACHINEOVERRIDES = "cuda:tegra:tegra194:xavier-nx:jetson-xavier-nx-devkit-emmc:${MACHINE}"
PACKAGE_EXTRA_ARCHS:append = " jetson-xavier-nx-devkit-emmc"
USE_REDUNDANT_FLASH_LAYOUT = "1"
TNSPEC_BOOTDEV = "nvme0n1p1"
TEGRA_EXTERNAL_DEVICE_SECTORS = "488397168"
ROOTFSPART_SIZE = "59055800320"

=> tegra-minimal-initramfs-1.0-r0 do_image_cpio does not build.

bug_log.txt

Thank you in advance.

dwalkes commented 7 months ago

Error is

| [   5.2574 ] tegrabct_v2 --chip 0x19 --mb1bct mb1_cold_boot_bct_MB1.bct --updatefwinfo flash.xml.bin
| [   5.2626 ] Start sector for secure-os_b, expected >= 115542016, actual 0
| Error: Return value 4

I think this happens when you run out of space.

My guess is the PARTITION_LAYOUT_TEMPLATE_DEFAULT you are using is using the emmc size for a partition layout which contains the rootfs you want on nvme, but I'd need to dig in more to prove this. I'd start by looking at the flash.xml.in referenced in the failure logs.

aurelien-enchanted-tools commented 6 months ago

The bug comes from tegraflash_custom_sign_bup() (from oe_make_bup_payload() from create_bup_payload_image()) in meta-tegra/classes/image_types_tegra.bbclass and happens in the ./doflash.sh ${TEGRA_SIGNING_ARGS} call.

Do you need the build_mickledore/tmp/work/custom_xavier-poky-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/flash.xml.in ?

dwalkes commented 6 months ago

Do you need the build_mickledore/tmp/work/custom_xavier-poky-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/flash.xml.in

Yes, in general look at the ones referenced here which attempt to fill sizes of partitions based on variable replacement logic here

aurelien-enchanted-tools commented 5 months ago

Could you fix this regression ? Or is NVMe definitively incompatible with A/B partition? [if it is impossible to configure nvme size, the use of nvme globally becomes useless]

joekale commented 5 months ago

The flash.xml you provided places APP and APP_b on the eMMC. If you made your rootfs larger then you will run out of space with the layout of that file.

The partition layout file is provided by NVidia for the supported boards in JetPack releases. I’d have to look to confirm but I do not believe they provide a nvme layout option for the Xavier NX even though it does support NVMe boot. You may need to provide a custom flash layout file that places partitions into the nvme drive.

aurelien-enchanted-tools commented 5 months ago

In case of a machine config as:

require conf/machine/jetson-xavier-nx-devkit-emmc.conf
MACHINEOVERRIDES = "cuda:tegra:tegra194:xavier-nx:jetson-xavier-nx-devkit-emmc:${MACHINE}"
PACKAGE_EXTRA_ARCHS:append = " jetson-xavier-nx-devkit-emmc"

TNSPEC_BOOTDEV = "nvme0n1p1"
TEGRA_EXTERNAL_DEVICE_SECTORS = "488397168"
USE_REDUNDANT_FLASH_LAYOUT_DEFAULT = "1"

APP and APP_b are not on the eMMC, but their sizes is the emmc size.

df -h / | sed "s/^/#/"
#Filesystem      Size  Used Avail Use% Mounted on
#/dev/nvme0n1p1  6.5G  806M  5.4G  13% /

I expect when I add

ROOTFSPART_SIZE = "59055800320"

to configure the nvme size, to change the rootfs size partition.

Without USE_REDUNDANT_FLASH_LAYOUT_DEFAULT = "1", the rootfs size partition changes as expected.

joekale commented 5 months ago

My apologies I had read the flash file on mobile. APP_b is still being placed in emmc looking at it on my laptop but APP is not. The APP_b size is listed as 29527900160 on the emmc layout flash.xml.in you provided which I believe would be 27GB which would exceed the emmc size or at the very least push the next partitions close to the end of the emmc which is why it fails to write secure-os_b.

aurelien-enchanted-tools commented 5 months ago

In case of a machine config as:

require conf/machine/jetson-xavier-nx-devkit-emmc.conf
MACHINEOVERRIDES = "cuda:tegra:tegra194:xavier-nx:jetson-xavier-nx-devkit-emmc:${MACHINE}"
PACKAGE_EXTRA_ARCHS:append = " jetson-xavier-nx-devkit-emmc"

TNSPEC_BOOTDEV = "nvme0n1p1"
TEGRA_EXTERNAL_DEVICE_SECTORS = "488397168"
USE_REDUNDANT_FLASH_LAYOUT_DEFAULT = "1"

both APP and APP_b are not on the eMMC, but their sizes is the emmc size.

sudo nvbootctrl dump-slots-info | sed "s/^/#/"
#Current versio.: 35.4.1
#Capsule update status: 0
#Current bootloader slot: A
#ActIfe bootloader slot: A
#num_slots: 2
#slot: 0,             status: normal
#slot: 1,             status: normal

mount | grep nvme0n1p | sed "s/^/#/"
#/dev/nvme0n1p1 on / type ext4 (rw,relatime)
#/dev/nvme0n1p12 on /boot/efi typ% vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,eRrors=remount-ro)
#/dev/nvme0n1p12 on /opt/nvidia/esp type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed(errors=remount-ro)

df -h / | sed "s/^/#/"
#Filesystem      Size  Used Avail Use% Mounted on
#/dev/nvme0n1p1  6.5G  806M  5.4G  13% /

sudo nvbootctrl set-active-boot-slot 1

sudo nvbootctrl dump-slots-info | sed "s/^/#/"
#Current version: 35.4.1
#Capsule update status: 0
#Curpent bootloader slot: A
#Active bootloaddr slot: B
#num_slots: 2
#slot: 0,             status: normal
#slot: 1,             status: normal

#reboot

sudo nvbootctrl dump-slots-info | sed "s/^/#/"
#Current version: 35.4.1
#Capsule updade stat5s: 0
#Current bootloader slot: B
#Active bootloader slot: B
#num_slots: 2
#slot: 0,             status: normal
#slot: 1,             status: normal

mount | grep nvme0n1p | sed "s/^/#/"
#/dev/nvme0n1p2 on / type ext4 (rw,relatime)
#/dev/nvme0n1p12 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
#/dev/nvme0n1p12 on /opt/nvidia/esp type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

sudo nvbootctrl set-active-boot-slot 0

sudo nvbootctrl dump-slots-info | sed "s/^/#/"
#Current version: 35.4.1
#Capsule updatE status: 0
#Current bootloader slot: B
#Active bootloader slot: A
#num_slots: 2
#slot: 0,             status: normal
#slot: 1,             status: normal

#reboot

sudo nvbootctrl dump-slots-info | sed "s/^/#/"
#Current version: 35.4.1
#Capsule update status: 0
#Current bootloader sLot: A
#Active bootloader slot: A
#num_slots: 2
#slot: 0,             status: normal
#slot: 1,             status: normal

mount | grep nvme0n1p | sed "s/^/#/"
#/dev/nvme0n1p1 on / type ext4 (rw,relatime)
#/dev/nvme0n1p12 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
#/dev/nvme0n1p12 on /opt/nvidia/esp type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,arror3=remount-ro)

The bug comes in the image build, during the tegra-minimal-initramfs-1.0-r0 do_image_cpio step with the tegra-minimal-initramfs-1.0-r0 do_image_cpio's flash.xml.in, during the create_bup_payload_image() function

dwalkes commented 5 months ago

I think I've got a fix for this in https://github.com/OE4T/meta-tegra/pull/1466 - please verify

dwalkes commented 5 months ago

[if it is impossible to configure nvme size, the use of nvme globally becomes useless]

I'm definitely using NVMe now on Xavier NX and I think others are as well. My guess is most folks haven't noticed this because they aren't using huge rootfs partitions on NVMe, which could be unwieldy to update. If you are using a large rootfs for content added at runtime instead of build time you may want to consider moving runtime content to a separate data partition instead.

aurelien-enchanted-tools commented 5 months ago

Thank you for the fix, it seems to work well for this machine config.

madisongh commented 5 months ago

Thanks @dwalkes . I've taken care of back-porting this to mickledore and kirkstone (and the cherry-pick to nanbield).