OE4T / meta-tegra

BSP layer for NVIDIA Jetson platforms, based on L4T
MIT License
394 stars 220 forks source link

ROOTFSPART_SIZE of 6GB produces a tegradevflash_v2 error #1104

Closed deribaucourt closed 1 year ago

deribaucourt commented 1 year ago

Hello,

When specifying ROOTFSPART_SIZE=6442450944 (6GB) on jetson-xavier-nx-devkit-emmc.conf, the flashing script tegra194-flash-helper.sh fails to complete. The command tegradevflash_v2 --pt flash.xml.bin --create works until the APP partition is being flashed but stays at 0% at this point. Meanwhile, the Xavier-NX's console displays the following error:

[0381.074] I> Writing bootloader-dtb_b partition.
[0385.613] I> Writing VER_b partition.
[0385.622] I> Writing VER partition.
[0385.631] I> Writing device 1: 3.
[0385.639] I> Writing APP partition.
[0385.674] E> NV3P_SERVER: Could not write 1048576 bytes.

To Reproduce Steps to reproduce the behavior:

  1. Build meta-tegra branch 'kirkstone' (3311bc2e4d6) with MACHINE based on 'jetson-xavier-nx-devkit-emmc.conf'
  2. Build with bitbake argument 'core-image-minimal'
  3. Deploy to hardware with the script oe4t-tegraflash-deploy
  4. See error during flashing

Additional context I posted this issue on the NVidia developers forum but they won't provide help for Yocto based distributions. This issue was not present with L4T 32.6.1, with meta-tegra branch honister. I recently upgraded to kirkstone with L4T 35.1 and discovered it. I need to change the ROOTFSPART_SIZE in order to add a separate data partition but discovered that this 6GB size is specifically bugged. A size of 7GB works fine.

Any help would be greatly appreciated!

madisongh commented 1 year ago

I just tried setting a 6GiB ROOTFSPARTSIZE (although on master and using the demo-image-base image from our demo distro), and it worked fine.

Are you using a custom flash layout XML file? The flashing tools in R35.1 now log the GPT partition map before trying to write to the device. Here's what mine looked like:

[   6.2230 ] gpt_secondary_3_0.bin:
[   6.2233 ] partition_id   partition_name                StartingLba   EndingLba
[   6.2235 ]            1   BCT                                            0         255
[   6.2237 ]            2   mb1                                          256         767
[   6.2239 ]            3   mb1_b                                        768        1279
[   6.2241 ]            4   MB1_BCT                                     1280        1407
[   6.2243 ]            5   MEM_BCT                                     1408        1919
[   6.2245 ]            6   spe-fw                                      1920        2431
[   6.2248 ]            7   mb2                                         2432        2943
[   6.2248 ]            8   mts-preboot                                 2944        3071
[   6.2248 ]            9   mts-mce                                     3072        3455
[   6.2248 ]           10   mts-proper                                  3456       11647
[   6.2248 ]           11   sc7                                        11648       11903
[   6.2248 ]           12   xusb-fw                                    11904       12287
[   6.2248 ]           13   cpu-bootloader                             12288       20479
[   6.2248 ]           14   bootloader-dtb                             20480       21375
[   6.2248 ]           15   secure-os                                  21376       26495
[   6.2248 ]           16   eks                                        26496       26623
[   6.2248 ]           17   adsp-fw                                    26624       28671
[   6.2248 ]           18   rce-fw                                     28672       30719
[   6.2248 ]           19   sce-fw                                     30720       32767
[   6.2248 ]           20   bpmp-fw                                    32768       35839
[   6.2248 ]           21   bpmp-fw-dtb                                35840       37887
[   6.2248 ]           22   reserved_for_chain_A_boot                  37888       41983
[   6.2248 ]           23   MB1_BCT_b                                  41984       42111
[   6.2248 ]           24   MEM_BCT_b                                  42112       42623
[   6.2248 ]           25   spe-fw_b                                   42624       43135
[   6.2248 ]           26   mb2_b                                      43136       43647
[   6.2248 ]           27   mts-preboot_b                              43648       43775
[   6.2248 ]           28   mts-mce_b                                  43776       44159
[   6.2248 ]           29   mts-proper_b                               44160       52351
[   6.2248 ]           30   sc7_b                                      52352       52607
[   6.2248 ]           31   xusb-fw_b                                  52608       52991
[   6.2248 ]           32   cpu-bootloader_b                           52992       61183
[   6.2248 ]           33   bootloader-dtb_b                           61184       62079
[   6.2248 ]           34   reserved_for_chain_B_boot                  62080       63359
[   6.2248 ]           35   uefi_variables                             63360       63615
[   6.2248 ]           36   uefi_ftw                                   63616       63999
[   6.2248 ]           37   BCT-boot-chain_backup                      64768       64895
[   6.2248 ]           38   reserved_partition                         64896       65023
[   6.2248 ]           39   secondary_gpt_backup                       65024       65151
[   6.2248 ]           40   VER_b                                      65152       65279
[   6.2248 ]           41   VER                                        65280       65407
[   6.2248 ] gpt_primary_1_3.bin:
[   6.2248 ] partition_id   partition_name                StartingLba   EndingLba
[   6.2248 ]            1   APP                                           40    12582951
[   6.2248 ]            2   kernel                                  12582952    12714023
[   6.2248 ]            3   kernel-dtb                              12714024    12714919
[   6.2248 ]            4   reserved_for_chain_A_user               12714920    12781567
[   6.2248 ]            5   secure-os_b                             29558784    29563903
[   6.2248 ]            6   eks_b                                   29563904    29564031
[   6.2248 ]            7   adsp-fw_b                               29564032    29566079
[   6.2248 ]            8   rce-fw_b                                29566080    29568127
[   6.2248 ]            9   sce-fw_b                                29568128    29570175
[   6.2248 ]           10   bpmp-fw_b                               29570176    29573247
[   6.2248 ]           11   bpmp-fw-dtb_b                           29573248    29575295
[   6.2248 ]           12   kernel_b                                29575296    29706367
[   6.2248 ]           13   kernel-dtb_b                            29706368    29707263
[   6.2248 ]           14   reserved_for_chain_B_user               29707264    29773823
[   6.2248 ]           15   recovery                                29773824    29902847
[   6.2248 ]           16   recovery-dtb                            29902848    29903871
[   6.2248 ]           17   RECROOTFS                               29903872    30518271
[   6.2248 ]           18   esp                                     30518272    30649343
[   6.2248 ]           19   UDA                                     30649344    30777310
[   6.2248 ] gpt_secondary_1_3.bin:
[   6.2248 ] partition_id   partition_name                StartingLba   EndingLba
[   6.2248 ]            1   APP                                           40    12582951
[   6.2248 ]            2   kernel                                  12582952    12714023
[   6.2248 ]            3   kernel-dtb                              12714024    12714919
[   6.2248 ]            4   reserved_for_chain_A_user               12714920    12781567
[   6.2248 ]            5   secure-os_b                             29558784    29563903
[   6.2248 ]            6   eks_b                                   29563904    29564031
[   6.2248 ]            7   adsp-fw_b                               29564032    29566079
[   6.2248 ]            8   rce-fw_b                                29566080    29568127
[   6.2248 ]            9   sce-fw_b                                29568128    29570175
[   6.2248 ]           10   bpmp-fw_b                               29570176    29573247
[   6.2248 ]           11   bpmp-fw-dtb_b                           29573248    29575295
[   6.2248 ]           12   kernel_b                                29575296    29706367
[   6.2249 ]           13   kernel-dtb_b                            29706368    29707263
[   6.2249 ]           14   reserved_for_chain_B_user               29707264    29773823
[   6.2249 ]           15   recovery                                29773824    29902847
[   6.2249 ]           16   recovery-dtb                            29902848    29903871
[   6.2249 ]           17   RECROOTFS                               29903872    30518271
[   6.2249 ]           18   esp                                     30518272    30649343
[   6.2249 ]           19   UDA                                     30649344    30777310

As you can see, the APP partition is (12582951 - 40 + 1) * 512 bytes, which matches the 6442450944 I set for ROOTFSPART_SIZE. If you've got a custom layout, make sure the logged partition table looks sane.

deribaucourt commented 1 year ago

Hello and thank you very much for testing the 6GB ROOTFSPART_SIZE with the demo distribution.

I tried building and flashing the same image as you and also found that it works fine. Here are the differences between both images:

In the case of the image that doesn't flash properly, I had the same GPT partitions map as you. However, I noticed USB related errors in the dmesg. I tried flashing with a more recent host kernel which resulted in other but similar errors. I have yet to flash with another computer.

[91009.495733] usb 4-2.3: new high-speed USB device number 24 using xhci_hcd
[91009.598272] usb 4-2.3: New USB device found, idVendor=0955, idProduct=7e19, bcdDevice= 1.02
[91009.598276] usb 4-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[91009.598279] usb 4-2.3: Product: APX
[91009.598280] usb 4-2.3: Manufacturer: NVIDIA Corp.
[91591.732116] INFO: task tegradevflash_v:3284323 blocked for more than 120 seconds.
[91591.732124]       Tainted: G           O      5.8.0-63-generic #71~20.04.1-Ubuntu
[91591.732126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[91591.732129] tegradevflash_v D    0 3284323 3283891 0x20020000
[91591.732133] Call Trace:
[91591.732147]  __schedule+0x394/0xa60
[91591.732153]  ? __internal_add_timer+0x2d/0x40
[91591.732156]  schedule+0x55/0xc0
[91591.732158]  schedule_timeout+0x8d/0x160
[91591.732161]  ? __next_timer_interrupt+0xe0/0xe0
[91591.732165]  wait_for_completion_timeout+0x8d/0x100
[91591.732170]  usb_start_wait_urb+0x8f/0x180
[91591.732173]  usb_bulk_msg+0xbb/0x170
[91591.732177]  proc_bulk+0x2ba/0x320
[91591.732180]  usbdev_do_ioctl+0x275/0x1010
[91591.732184]  usbdev_ioctl+0xe/0x20
[91591.732188]  compat_ptr_ioctl+0x1d/0x30
[91591.732191]  __ia32_compat_sys_ioctl+0x14e/0x170
[91591.732197]  do_syscall_32_irqs_on+0x4a/0x70
[91591.732201]  do_int80_syscall_32+0x10/0x20
[91591.732205]  entry_INT80_compat+0x88/0x8d
[91591.732208] RIP: 0023:0x8066d54
[91591.732210] RSP: 002b:00000000ffa2e778 EFLAGS: 00000283 ORIG_RAX: 0000000000000036
[91591.732213] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000c0105502
[91591.732214] RDX: 00000000ffa2e7a0 RSI: 0000000000000010 RDI: 000000000917af70
[91591.732215] RBP: 00000000ffa2e800 R08: 0000000000000000 R09: 0000000000000000
[91591.732216] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[91591.732217] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

I'll keep trying different settings to find out why my image can't be flashed and keep updating this ticket for other user who might encounter the same behaviour.

deribaucourt commented 1 year ago

I finally found the problem. Thanks for pointing me to the demo image which allowed me to figure out the differences. I don't use the .ext4 rootfs generated by yocto because I have a separate data partition. Hence I use WIC to split the rootfs into 2 .ext4. However, the resulting files do not have the exact size they should have through IMAGE_ROOTFS_SIZE.

It turns out the newer tegraflash silently fails if the partitions files are not exactly the size declared in the XML:

I have yet to put back my data partition together, but I'm unstuck now.