OE4T / meta-mender-community

Community supported integration layers for Mender on various boards
Apache License 2.0
5 stars 5 forks source link

fix: Tegra, Jetpack 5: add Jetson as working target #19

Closed mwest90 closed 2 months ago

mwest90 commented 8 months ago

Minimal working configuration to make Orin and Jetson both work for JetPack 5

dwalkes commented 8 months ago

Also your commit needs a signoff before it could be upstreamed - see https://github.com/mendersoftware/mender/blob/master/CONTRIBUTING.md#sign-your-work

Islam-Hussein-11 commented 5 months ago

Hi,

I think this line RDEPENDS:${PN}:append = " tegra-boot-tools-updater" in tegra-bup-payload_%.bbappend has to be removed as there is no tegra-boot-tools-updater provided in kirkstone branch it was only found kirkstone-l4t-32.7.x in tegra-boot-tools.inc and not anymore.

secondly I have tested this PR on jetson-xavier-nx-devkit-emmc and I get an error in tegra-minimal-initramfs.bb:do_image_cpio ` [ 5.0388 ] tegrahost_v2 --chip 0x19 --align mce_c10_prod_cr_aligned.bin [ 5.0441 ] tegrahost_v2 --chip 0x19 0 --magicid MTSM --appendsigheader mce_c10_prod_cr_aligned.bin zerosbk [ 5.0461 ] Header already present for mce_c10_prod_cr_aligned.bin [ 5.0564 ] tegrasign_v3.py --key None --list mce_c10_prod_cr_aligned_sigheader.bin_list.xml --pubkeyhash pub_key.key [ 5.0570 ] Assuming zero filled SBK key [ 5.0606 ] Warning: pub_key.key is not found [ 5.0590 ] tegrahost_v2 --chip 0x19 0 --updatesigheader mce_c10_prod_cr_aligned_sigheader.bin.encrypt mce_c10_prod_cr_aligned_sigheader.bin.hash zerosbk [ 5.0702 ] tegrahost_v2 --chip 0x19 --align mts_c10_prod_cr_aligned.bin [ 5.0755 ] tegrahost_v2 --chip 0x19 0 --magicid MTSB --appendsigheader mts_c10_prod_cr_aligned.bin zerosbk [ 5.0776 ] adding BCH for mts_c10_prod_cr_aligned.bin [ 5.1568 ] tegrasign_v3.py --key None --list mts_c10_prod_cr_aligned_sigheader.bin_list.xml --pubkeyhash pub_key.key [ 5.1581 ] Assuming zero filled SBK key [ 5.1678 ] Warning: pub_key.key is not found [ 5.1663 ] tegrahost_v2 --chip 0x19 0 --updatesigheader mts_c10_prod_cr_aligned_sigheader.bin.encrypt mts_c10_prod_cr_aligned_sigheader.bin.hash zerosbk [ 5.2137 ] tegrahost_v2 --chip 0x19 --align bpmp-2_t194_aligned.bin [ 5.2191 ] tegrahost_v2 --chip 0x19 0 --magicid BPMF --ratchet_blob ratchet_blob.bin --appendsigheader bpmp-2_t194_aligned.bin zerosbk [ 5.2211 ] adding BCH for bpmp-2_t194_aligned.bin [ 5.2507 ] tegrasign_v3.py --key None --list bpmp-2_t194_aligned_sigheader.bin_list.xml --pubkeyhash pub_key.key [ 5.2518 ] Assuming zero filled SBK key [ 5.2570 ] Warning: pub_key.key is not found [ 5.2554 ] tegrahost_v2 --chip 0x19 0 --updatesigheader bpmp-2_t194_aligned_sigheader.bin.encrypt bpmp-2_t194_aligned_sigheader.bin.hash zerosbk [ 5.2739 ] tegrahost_v2 --chip 0x19 --align tegra194-a02-bpmp-p3668-a00_lz4_aligned.dtb [ 5.2793 ] tegrahost_v2 --chip 0x19 0 --magicid BPMD --ratchet_blob ratchet_blob.bin --appendsigheader tegra194-a02-bpmp-p3668-a00_lz4_aligned.dtb zerosbk [ 5.2814 ] adding BCH for tegra194-a02-bpmp-p3668-a00_lz4_aligned.dtb [ 5.2909 ] tegrasign_v3.py --key None --list tegra194-a02-bpmp-p3668-a00_lz4_aligned_sigheader.dtb_list.xml --pubkeyhash pub_key.key [ 5.2917 ] Assuming zero filled SBK key [ 5.2950 ] Warning: pub_key.key is not found [ 5.2933 ] tegrahost_v2 --chip 0x19 0 --updatesigheader tegra194-a02-bpmp-p3668-a00_lz4_aligned_sigheader.dtb.encrypt tegra194-a02-bpmp-p3668-a00_lz4_aligned_sigheader.dtb.hash zerosbk [ 5.2999 ] tegrahost_v2 --chip 0x19 --align spe_t194_aligned.bin [ 5.3050 ] tegrahost_v2 --chip 0x19 0 --magicid SPEF --ratchet_blob ratchet_blob.bin --appendsigheader spe_t194_aligned.bin zerosbk [ 5.3070 ] adding BCH for spe_t194_aligned.bin [ 5.3177 ] tegrasign_v3.py --key None --list spe_t194_aligned_sigheader.bin_list.xml --pubkeyhash pub_key.key [ 5.3186 ] Assuming zero filled SBK key [ 5.3218 ] Warning: pub_key.key is not found [ 5.3202 ] tegrahost_v2 --chip 0x19 0 --updatesigheader spe_t194_aligned_sigheader.bin.encrypt spe_t194_aligned_sigheader.bin.hash zerosbk [ 5.3283 ] tegrahost_v2 --chip 0x19 --align tos-optee_t194_aligned.img [ 5.3335 ] tegrahost_v2 --chip 0x19 0 --magicid TOSB --ratchet_blob ratchet_blob.bin --appendsigheader tos-optee_t194_aligned.img zerosbk [ 5.3355 ] adding BCH for tos-optee_t194_aligned.img [ 5.3669 ] tegrasign_v3.py --key None --list tos-optee_t194_aligned_sigheader.img_list.xml --pubkeyhash pub_key.key [ 5.3679 ] Assuming zero filled SBK key [ 5.3730 ] Warning: pub_key.key is not found [ 5.3714 ] tegrahost_v2 --chip 0x19 0 --updatesigheader tos-optee_t194_aligned_sigheader.img.encrypt tos-optee_t194_aligned_sigheader.img.hash zerosbk [ 5.3911 ] tegrahost_v2 --chip 0x19 --align eks_aligned.img [ 5.3965 ] tegrahost_v2 --chip 0x19 0 --magicid EKSB --ratchet_blob ratchet_blob.bin --appendsigheader eks_aligned.img zerosbk [ 5.3985 ] adding BCH for eks_aligned.img [ 5.4074 ] tegrasign_v3.py --key None --list eks_aligned_sigheader.img_list.xml --pubkeyhash pub_key.key [ 5.4085 ] Assuming zero filled SBK key [ 5.4117 ] Warning: pub_key.key is not found [ 5.4101 ] tegrahost_v2 --chip 0x19 0 --updatesigheader eks_aligned_sigheader.img.encrypt eks_aligned_sigheader.img.hash zerosbk [ 5.4163 ] tegrahost_v2 --chip 0x19 --align tegra194-p3668-all-p3509-0000_aligned.dtb [ 5.4215 ] tegrahost_v2 --chip 0x19 0 --magicid CDTB --ratchet_blob ratchet_blob.bin --appendsigheader tegra194-p3668-all-p3509-0000_aligned.dtb zerosbk [ 5.4234 ] adding BCH for tegra194-p3668-all-p3509-0000_aligned.dtb [ 5.4396 ] tegrasign_v3.py --key None --list tegra194-p3668-all-p3509-0000_aligned_sigheader.dtb_list.xml --pubkeyhash pub_key.key [ 5.4407 ] Assuming zero filled SBK key [ 5.4445 ] Warning: pub_key.key is not found [ 5.4430 ] tegrahost_v2 --chip 0x19 0 --updatesigheader tegra194-p3668-all-p3509-0000_aligned_sigheader.dtb.encrypt tegra194-p3668-all-p3509-0000_aligned_sigheader.dtb.hash zerosbk [ 5.4536 ] tegrahost_v2 --chip 0x19 --align nvtboot_recovery_cpu_t194_aligned.bin [ 5.4589 ] tegrahost_v2 --chip 0x19 0 --magicid CPBL --ratchet_blob ratchet_blob.bin --appendsigheader nvtboot_recovery_cpu_t194_aligned.bin zerosbk [ 5.4609 ] adding BCH for nvtboot_recovery_cpu_t194_aligned.bin [ 5.4746 ] tegrasign_v3.py --key None --list nvtboot_recovery_cpu_t194_aligned_sigheader.bin_list.xml --pubkeyhash pub_key.key [ 5.4755 ] Assuming zero filled SBK key [ 5.4791 ] Warning: pub_key.key is not found [ 5.4776 ] tegrahost_v2 --chip 0x19 0 --updatesigheader nvtboot_recovery_cpu_t194_aligned_sigheader.bin.encrypt nvtboot_recovery_cpu_t194_aligned_sigheader.bin.hash zerosbk [ 5.4829 ] Copying signed file in /home/isalm/Desktop/mender-tegra/tegra-demo-distro/build/tmp/work/jetson_xavier_nx_devkit_emmc-oe4t-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/signed [ 5.5515 ] Copying br bct for multi chains [ 5.5518 ] Signed BCT for boot chain A is copied to /home/isalm/Desktop/mender-tegra/tegra-demo-distro/build/tmp/work/jetson_xavier_nx_devkit_emmc-oe4t-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/signed/br_bct_BR.bct
[ 5.5520 ] Signed BCT for boot chain B is copied to /home/isalm/Desktop/mender-tegra/tegra-demo-distro/build/tmp/work/jetson_xavier_nx_devkit_emmc-oe4t-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/signed/br_bct_b_BR.bct
[ 5.5520 ] Copying BCT backup image bct_backup.img to /home/isalm/Desktop/mender-tegra/tegra-demo-distro/build/tmp/work/jetson_xavier_nx_devkit_emmc-oe4t-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/signed/bct_backup.img
[ 5.5672 ] tegraparser_v2 --pt flash.xml.bin --generateflashindex /home/isalm/Desktop/mender-tegra/tegra-demo-distro/build/tmp/work/jetson_xavier_nx_devkit_emmc-oe4t-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/signed/flash.xml.tmp flash.idx
[ 5.5693 ] File SMDFILE open failed
Error: Return value 19
Command tegraparser_v2 --pt flash.xml.bin --generateflashindex /home/isalm/Desktop/mender-tegra/tegra-demo-distro/build/tmp/work/jetson_xavier_nx_devkit_emmc-oe4t-linux/tegra-minimal-initramfs/1.0-r0/bup-payload/signed/flash.xml.tmp flash.idx
WARNING: exit code 1 from a shell command.

ERROR: Task (/home/isalm/Desktop/mender-tegra/tegra-demo-distro/layers/meta-tegra/recipes-core/images/tegra-minimal-initramfs.bb:do_image_cpio) failed with exit code '1' NOTE: Tasks Summary: Attempted 4927 tasks of which 4926 didn't need to be rerun and 1 failed.`

cakre commented 5 months ago

Rolling back doesn't seem to work currently. I worked on a fix for that, implementing rollback on top of your branch (switching back to the previous chain when rolling back or not commiting and also not switching the boot chain when rolling back before doing a reboot).

I'd open a PR to your repo if thats ok with you?

mwest90 commented 5 months ago

Rolling back doesn't seem to work currently. I worked on a fix for that, implementing rollback on top of your branch (switching back to the previous chain when rolling back or not commiting and also not switching the boot chain when rolling back before doing a reboot).

I'd open a PR to your repo if thats ok with you?

That would be great!

cakre commented 5 months ago

Rolling back doesn't seem to work currently. I worked on a fix for that, implementing rollback on top of your branch (switching back to the previous chain when rolling back or not commiting and also not switching the boot chain when rolling back before doing a reboot). I'd open a PR to your repo if thats ok with you?

That would be great!

I opened a PR

irodriguez-veridas commented 5 months ago

I opened a PR

Looks good! I think this also enables adding custom checks to validate an upgrade. Just by adding any ArtifactCommit_Leave_XX where XX<50 it's possible to verify an upgrade and make mender roll it back if a failure is returned, right?

cakre commented 5 months ago

I opened a PR

Looks good! I think this also enables adding custom checks to validate an upgrade. Just by adding any ArtifactCommit_Leave_XX where XX<50 it's possible to verify an upgrade and make mender roll it back if a failure is returned, right?

If you want to add custom checks you should do that in ArtifactCommit_Enter or earlier, since it's too late to roll back if we already committed the update. (See mender docs)

The state script the PR added just serves to make the partition switch persistent when we are sure, that the update was successful (ArtifactCommit_Leave is the last "step" thats executed if everything went alright)

irodriguez-veridas commented 5 months ago

If you want to add custom checks you should do that in ArtifactCommit_Enter or earlier, since it's too late to roll back if we already committed the update. (See mender docs)

Oh, yes, you are totally right. Even if partition switch is not verified it would be committed from mender point of view. Thanks!

dwalkes commented 3 months ago

Hey everyone thanks for all the work on this. I haven't kept up on the overall status, can someone summarize what is or isn't working and what would need to happen before this could be upstreamed, if anything?

I also noticed an option at https://hub.mender.io/t/swupdate/6953/2 which could be interesting for mender integration, since there is already a working swupdate example at https://github.com/OE4T/tegra-demo-distro?tab=readme-ov-file#update-image-demo

maxekman commented 3 months ago

For me I have successfully compiled and flashed a Syslogic Jetson Xavier box (lots of custom board support etc, so I can’t say anything about the dev kit). I have not yet had time to test a Mender update on it, but it should be fairly easy to do.

I plan to do the same test on a Syslogic Jetson Orin after I’m happy with the Xavier.

That is a small summary from my side at least.

irodriguez-veridas commented 3 months ago

Hey everyone thanks for all the work on this. I haven't kept up on the overall status, can someone summarize what is or isn't working and what would need to happen before this could be upstreamed, if anything?

Hi Dan! From my side I can confirm that Orin Nano devkit is working. I don't have an AGX Xavier devkit or AGX Orin devkit to test it. I was waiting for @mwest90 to do it but he is missing since some time ago. According to the work done by @maxekman I suppose it should work for AGX Xavier devkit but it would be better to confirm it.

Additionally, I think @maxekman has used this branch on my repository to do the tests. That branch includes using default redundant layout provided by nVidia and solves issue with reserved space for nVidia partitions. @maxekman If you confirm this is the branch you have used I will PR it to @mwest90 branch so that changes are added to this PR. You can also then PR your changes to correct partition offsets, BTW, have you discovered why these offsets were so high?

I also noticed an option at https://hub.mender.io/t/swupdate/6953/2 which could be interesting for mender integration, since there is already a working swupdate example at https://github.com/OE4T/tegra-demo-distro?tab=readme-ov-file#update-image-demo

It may be interesting testing it. Anyway, I think most of the changes added in this PR would be needed anyway since in the end mender client will be the responsible of download the swupdate package, reboot the system, verify that the upgrade has completed correctly and rollback if it has failed.

maxekman commented 3 months ago

@irodriguez-veridas Yes, I have been using your branch + latest meta-tegra kirkstone. Please PR that here!

I’ll try to do a PR with the offset fix. And no, sorry to say but I haven’t yet found much useful on why they are there..

mwest90 commented 3 months ago

Hi all, I have been out for a month or so now, but back from start of this week and now have time to look at this again.

Thanks so much @irodriguez-veridas for all the work! I can test AGX Orin devkit this week, and at least a custom AGX Xavier board (might be able to scrounge up a devkit there as well). Would you recommend testing from the branch in your repo?

irodriguez-veridas commented 3 months ago

Hi all, I have been out for a month or so now, but back from start of this week and now have time to look at this again.

Thanks so much @irodriguez-veridas for all the work! I can test AGX Orin devkit this week, and at least a custom AGX Xavier board (might be able to scrounge up a devkit there as well). Would you recommend testing from the branch in your repo?

@mwest90 it would be nice if you can test AGX Xavier on that branch. If everything works I will PR it here and then we can test AGX Orin directly on this PR. What do you think?

maxekman commented 3 months ago

Here is the PR with the partition offset fix to this branch: https://github.com/mwest90/meta-mender-community/pull/7

maxekman commented 3 months ago

I would also like to test both Xavier and Orin directly on this PR as soon as all PRs are in.

Lets also try to close any outdated conversations above!

TheYoctoJester commented 3 months ago

@dwalkes and @irodriguez-veridas thanks for keeping this going, please ping me if there is something the Mender side can help with. Concerning the various mentioned points:

cakre commented 3 months ago

I noticed some issues with Xavier NX:

I can open a MR with the fixes for the first two problems (or put them in separate MRs if you'd prefer that) . I haven't found the cause for the third issue yet

mwest90 commented 3 months ago

@irodriguez-veridas @maxekman, I tested now with the PR from @maxekman, and it works fine. I did not need this commit when using that PR: https://github.com/irodriguez-veridas/oe4t-meta-mender-community/commit/6d78bfb9f25b71466f675f864a9f95cab4395934

irodriguez-veridas commented 3 months ago

did not need this commit when using that PR:

@mwest90 should I push it then or not? It's not clear for me if @maxekman PR is enough or not

mwest90 commented 3 months ago

@irodriguez-veridas we need the other commits from your branch that is ahead of mine at least, the ones who removed the custom flash layout.

cakre commented 3 months ago

@mwest90 I opened a PR fixing the xavier-nx issues I mentioned in a previous comment

irodriguez-veridas commented 3 months ago

we need the other commits from your branch that is ahead of mine at least, the ones who removed the custom flash layout.

Perfect. PR opened

mwest90 commented 3 months ago

@irodriguez-veridas @cakre @dwalkes With the two last commits merged all fixes should be in this branch ready for final testing.

maxekman commented 3 months ago

Very cool, I’ll also try it out if I have time.

irodriguez-veridas commented 3 months ago

With the two last commits merged all fixes should be in this branch ready for final testing.

I have just tested it on Orin Nano devkit and works correctly. Mender update successfully applied

image

TheYoctoJester commented 2 months ago

@dwalkes how should we move ahead with that one?

mwest90 commented 2 months ago

I just tested everything here again as well, it all works. Had to remove one line in tegra-mender-setup.bbclass, this line is no longer needed after commit "3009c7e4fd037559a88b8c9bb65ed7b42019ea35", and actually breaks updating devices that was made before that change.

@dwalkes what is needed to merge?

dwalkes commented 2 months ago

@dwalkes how should we move ahead with that one? @dwalkes what is needed to merge?

I've only briefly looked at it but I think we need to first rebase OE4T:kirkstone in this repo on the latest kirkstone upstream at https://github.com/mendersoftware/meta-mender-community OR we can just bypass this PR and send a PR directly upstream to https://github.com/mendersoftware/meta-mender-community if that's easier. I think the purpose of this repo was to stage the Jetpack 4/5 split before we were ready to upstream this. Maybe we can skip that now since it appears multiple people have already successfully tested on Jetpack 5.

Has anyone tested that Jetpack 4 builds/deployments aren't broken? We might want to do that before we send this upstream as well.

Also probably a question for @TheYoctoJester but if we send upstream should we be targeting scarthgap first at this point?

TheYoctoJester commented 2 months ago

@dwalkes my take on it are essentially two things.

1) no need to target scarthgap first. Kirkstone is well in support, and the JP4 builds I keep in the loop for it look reasonably well, so I'd like to keep them. Once there is at least one known good build for JP in the upstream layer, it becomes a lot easier to add the others, respectively forward port to scarthgap.

2) the layer split JP4/JP5 in OE4T/meta-mender-community was primarily a staging area, correct. From my side, the question is how you envision it in the long(er) run. If you'd like to view it as the OE4T-owned incarnation and just push to upstream once things have stabilized, I'm okay with that. If you feel that OE4T should focus on the actual board support and not burden itself with the additional layer, then we can also treat upstream as the prime resource. I personally would choose the latter, but I'll help whichever approach you think is most beneficial to users and contributors.

For rebasing the current state on kirkstone, I can do that. Would you prefer me to submit a bump PR to this repo, or merge back a compatible state to https://github.com/mendersoftware/meta-mender-community?

dwalkes commented 2 months ago

If you'd like to view it as the OE4T-owned incarnation and just push to upstream once things have stabilized, I'm okay with that. If you feel that OE4T should focus on the actual board support and not burden itself with the additional layer, then we can also treat upstream as the prime resource

I think we should definitely use upstream as the prime resource and main merge destination going forward once we have something people can use with Jetpack 5. We only need to use the OE4T repo to stage big changes which aren't yet ready for deploying and probably likely to break stuff in the upstream during transition. Jetpack 6 comes to mind... we might want/need to do something similar for scarthgap and Jetpack 6. However once we have something useful, which I believe we have here with this PR, I think we should start sending PRs directly upstream. Does that approach make sense to you?

For rebasing the current state on kirkstone, I can do that

Thanks that would be great.

Would you prefer me to submit a bump PR to this repo, or merge back a compatible state to https://github.com/mendersoftware/meta-mender-community?

I don't think the state of the repo before the current PR discussed here will be generally useful in the sense that I don't think it added any features to Jetpack 4 and was likely to break stuff and wasn't complete with Jetpack 5. So my recommendation is to do a bump PR to this repo first and do the upstream merge after the PR discussed here is ready.

I think I didn't really explain my question well previously but it was whether to abandon this PR completely and go directly upstream with a new PR or to finish it, then merge upstream. The more I think about it though, even if we ultimately go directly upstream afterward it probably still makes sense to merge this PR locally so we keep a full record of the work done here and can reference as a merged PR.

I'm still curious whether anyone has done testing on Jetpack 4 with this PR to make sure Jetpack 4 support isn't broken. If not I can take this on. I'd like to go through AGX Xavier, Xavier NX, and Nano with a quick smoke test to make sure build and runtime are still functional before we go upstream. We could possibly defer that until after we merge this one locally into the OE4T repo, however.

I'd also suggest some type of stress test for Jetpack 5 on at least one platform if nobody has done this yet. I have some scripts at https://github.com/OE4T/meta-mender-community/tree/kirkstone/meta-mender-tegra/meta-mender-tegra-jetpack4/scripts/test I used for Jetpack 4 which probably could use some tweaks and then could be shared with Jetpack 5. We could probably wait to do this after the upstream merge too.

TheYoctoJester commented 2 months ago

For rebasing the current state on kirkstone, I can do that

Thanks that would be great.

Would you prefer me to submit a bump PR to this repo, or merge back a compatible state to https://github.com/mendersoftware/meta-mender-community?

I don't think the state of the repo before the current PR discussed here will be generally useful in the sense that I don't think it added any features to Jetpack 4 and was likely to break stuff and wasn't complete with Jetpack 5. So my recommendation is to do a bump PR to this repo first and do the upstream merge after the PR discussed here is ready.

With the naive bump PR ending up very messy, how about a merge (https://github.com/OE4T/meta-mender-community/pull/21) upon which we can then rebase this PR?

TheYoctoJester commented 2 months ago

@mwest90, another thing that needs fixing before this PR can be merged: none of your commits has a Signed-off-by:-line.

mwest90 commented 2 months ago

@TheYoctoJester Signed of my commits now

TheYoctoJester commented 2 months ago

Gave this PR a rebase onto the merged state of kirkstone (https://github.com/OE4T/meta-mender-community/pull/21), and the Tegra builds all pass for me: https://github.com/TheYoctoJester/meta-mender-community/tree/mwest90-kirkstone / https://github.com/TheYoctoJester/meta-mender-community/actions/runs/9956929310

@dwalkes for your smoke tests, anything I can help without having the actual hardware? Or anything we still need to address?

cakre commented 2 months ago

8756a1b Removed edk2-firmware-tegra_%.bbappend completely resulting in the updated L4TConfiguration-[...].dtsi not being applied which includes a change to the retry count needed for rollbacks to work properly

dwalkes commented 2 months ago

Gave this PR a rebase onto the merged state of kirkstone (https://github.com/OE4T/meta-mender-community/pull/21),

Thanks @TheYoctoJester added a question there about how to proceed with that one.

anything I can help without having the actual hardware?

I don't think so

anything we still need to address?

Probably just the most recent comment from @cakre as well as some testing on Jetpack 4 with hardware.

@cakre

You mean the change here: https://github.com/OE4T/meta-mender-community/pull/19/files#diff-94b9a1f2bb0917c8a8df1c2394cdab37f0c8a8a7138aece71a81cb986224d035R29-R32 is needed to make a failed boot after upgrade result in a rollback. Are you testing this with a power cycle during reboot and after upgrade?

dwalkes commented 2 months ago

@TheYoctoJester did you already have work done to handle the conflicts now that we've rebased the kirkstone branch on upstream?

TheYoctoJester commented 2 months ago

HI @dwalkes yup, I've prepared a rebased state with the resolved conflicts (to my best knowledge) at https://github.com/TheYoctoJester/meta-mender-community/tree/mwest90-working-jetson-orin. The build pipeline has just started, hopefully no breakage emerges. As I can't modify this PR, I guess you or @mwest90 will have to do the force push.

cakre commented 2 months ago

8756a1b Removed edk2-firmware-tegra_%.bbappend completely resulting in the updated L4TConfiguration-[...].dtsi not being applied which includes a change to the retry count needed for rollbacks to work properly

Gave this PR a rebase onto the merged state of kirkstone (#21),

Thanks @TheYoctoJester added a question there about how to proceed with that one.

anything I can help without having the actual hardware?

I don't think so

anything we still need to address?

Probably just the most recent comment from @cakre as well as some testing on Jetpack 4 with hardware.

@cakre

You mean the change here: https://github.com/OE4T/meta-mender-community/pull/19/files#diff-94b9a1f2bb0917c8a8df1c2394cdab37f0c8a8a7138aece71a81cb986224d035R29-R32 is needed to make a failed boot after upgrade result in a rollback. Are you testing this with a power cycle during reboot and after upgrade?

Yes this causes the jetson to fall back to the old boot chain after one try (which is expected by mender). Powerloss is not detected and won't cause the update to "fail" since coldboots don't decrease the retry counter. I don't think there is anything we can do about that

mwest90 commented 2 months ago

8756a1b Removed edk2-firmware-tegra_%.bbappend completely resulting in the updated L4TConfiguration-[...].dtsi not being applied which includes a change to the retry count needed for rollbacks to work properly

Gave this PR a rebase onto the merged state of kirkstone (#21),

Thanks @TheYoctoJester added a question there about how to proceed with that one.

anything I can help without having the actual hardware?

I don't think so

anything we still need to address?

Probably just the most recent comment from @cakre as well as some testing on Jetpack 4 with hardware. @cakre You mean the change here: https://github.com/OE4T/meta-mender-community/pull/19/files#diff-94b9a1f2bb0917c8a8df1c2394cdab37f0c8a8a7138aece71a81cb986224d035R29-R32 is needed to make a failed boot after upgrade result in a rollback. Are you testing this with a power cycle during reboot and after upgrade?

Yes this causes the jetson to fall back to the old boot chain after one try (which is expected by mender). Powerloss is not detected and won't cause the update to "fail" since coldboots don't decrease the retry counter. I don't think there is anything we can do about that

@cakre @dwalkes I will bring back the edk2-firmware-tegra_%.bbappend with the FILESEXTRAPATHS section then. @cakre do you have a good procedure for testing this?

cakre commented 2 months ago

8756a1b Removed edk2-firmware-tegra_%.bbappend completely resulting in the updated L4TConfiguration-[...].dtsi not being applied which includes a change to the retry count needed for rollbacks to work properly

Gave this PR a rebase onto the merged state of kirkstone (#21),

Thanks @TheYoctoJester added a question there about how to proceed with that one.

anything I can help without having the actual hardware?

I don't think so

anything we still need to address?

Probably just the most recent comment from @cakre as well as some testing on Jetpack 4 with hardware. @cakre You mean the change here: https://github.com/OE4T/meta-mender-community/pull/19/files#diff-94b9a1f2bb0917c8a8df1c2394cdab37f0c8a8a7138aece71a81cb986224d035R29-R32 is needed to make a failed boot after upgrade result in a rollback. Are you testing this with a power cycle during reboot and after upgrade?

Yes this causes the jetson to fall back to the old boot chain after one try (which is expected by mender). Powerloss is not detected and won't cause the update to "fail" since coldboots don't decrease the retry counter. I don't think there is anything we can do about that

@cakre @dwalkes I will bring back the edk2-firmware-tegra_%.bbappend with the FILESEXTRAPATHS section then. @cakre do you have a good procedure for testing this?

nvbootctrl dump-slots-info -t rootfs tells you the retry_count for each rootfs slot. It should be at 1 for both slots.

If you want to test the mender update/rollback: Install the mender file and reboot, then you should be on the other slot. You can check that with the nvbootctrl command above. It should also show retry_count=0 for your current slot. retry_count gets reset to 1 when you mender commit. If you want to test the rollback feature just reboot again instead of commiting and you should be back on the previous slot. nvbootcontrol should now claim the other slot is unbootable

cakre commented 2 months ago

I noticed that mender-client-systemd-machine-id.service fails with mender 4 with this PR. I opened another PR fix this.

zach-welch-aquabyte commented 2 months ago

I have a WIP branch to upgrade this branch to scarthgap at zach-welch-aquabyte/meta-mender-community:scarthgap-tegra-jetpack5.

With those changes, I can build an image that boots on my custom Orin NX system, and I have successfully performed a couple of upgrades to prove to myself that it works as expected. Thanks to everyone for all the hard work getting this branch into working shape. Once this current PR/branch lands, I can rebase my changes and push a new PR to merge this into the scarthgap branch, unless someone else wants to take the lead on that.

TheYoctoJester commented 2 months ago

I have a WIP branch to upgrade this branch to scarthgap at zach-welch-aquabyte/meta-mender-community:scarthgap-tegra-jetpack5.

With those changes, I can build an image that boots on my custom Orin NX system, and I have successfully performed a couple of upgrades to prove to myself that it works as expected. Thanks to everyone for all the hard work getting this branch into working shape. Once this current PR/branch lands, I can rebase my changes and push a new PR to merge this into the scarthgap branch, unless someone else wants to take the lead on that.

Very cool, thats awesome news!

The base of meta-mender-tegra/meta-mender-tegra-jetpack5 has already been aligned with upstream on the repository, and I have prepared a rebase of this PR which solves the conflicts. Hopefully I can get that updated to the latest state later today.

For movig forward, we would need somebody with push permissions to this PR to take it in, and unless new issues are found I think we should be good to merge?

For the scarthgap port, I generally would think it should be merged after that, and we need to make sure all of the (relevant) bits and pieces are also there so we don't end up with needlessly diverging feature sets or compatibilities. Thoughts @mwest90 & @dwalkes

dwalkes commented 2 months ago

I have prepared a rebase of this PR which solves the conflicts. Hopefully I can get that updated to the latest state later today.

Great!

For moving forward, we would need somebody with push permissions to this PR to take it in, and unless new issues are found I think we should be good to merge?

Agreed, or we could just bypass this repo at this point since everything is just upstream and since you are doing the rebase anyway, just close based on your new PR and link here for the history if anyone needs or wants this.

For the scarthgap port, I generally would think it should be merged after that, and we need to make sure all of the (relevant) bits and pieces are also there so we don't end up with needlessly diverging feature sets or compatibilities. Thoughts @mwest90 & @dwalkes

Agree... if it's ready to go as-is we don't need to target OE4T/meta-mender-community at all, we can just go upstream. The only reason to target here IMHO would be if we need more help to get it ready for all machines or rework on the latest of relevant oe4t/meta-tegra branches.

dwalkes commented 2 months ago

I just reviewed all the discussions, looks like there are three minor ones from @apbr which aren't yet incorporated in the branch from @mwest90. @TheYoctoJester if you have a branch in progress which is rebased you might want to just handle there. See the ones starting at https://github.com/OE4T/meta-mender-community/pull/19#discussion_r1686468568

TheYoctoJester commented 2 months ago

I just reviewed all the discussions, looks like there are three minor ones from @apbr which aren't yet incorporated in the branch from @mwest90. @TheYoctoJester if you have a branch in progress which is rebased you might want to just handle there. See the ones starting at https://github.com/OE4T/meta-mender-community/pull/19#discussion_r1686468568

Sure, can do that.

TheYoctoJester commented 2 months ago

@dwalkes @mwest90 done. Therefore, https://github.com/OE4T/meta-mender-community/pull/22 technically supersedes this PR.

TheYoctoJester commented 2 months ago

Agree... if it's ready to go as-is we don't need to target OE4T/meta-mender-community at all, we can just go upstream. The only reason to target here IMHO would be if we need more help to get it ready for all machines or rework on the latest of relevant oe4t/meta-tegra branches.

Sorry, missed that comment. So if that's the preferred way please comment at https://github.com/mendersoftware/meta-mender-community/pull/394 (specifically @mwest90), then we can get it in right away.