OE4T / meta-tegra

BSP layer for NVIDIA Jetson platforms, based on L4T
MIT License
405 stars 222 forks source link

nvidia-kernel-oot: Conflicts with intree kernel modules of the same name #1576

Closed lms-ts closed 3 months ago

lms-ts commented 3 months ago

There seems to be an error when installing intree kernel modules that are also RPROVIDED by nvidia-kernel-oot.

Relevant prerequisites: Branch: scarthgap PREFERRED_PROVIDER_virtual_dtb = "nvidia-kernel-oot" IMAGE_INSTALL += "kernel-modules Only nvidia-kernel-oot-base is explicitly added to the custom image.

When I try to build a custom image the following error occurs:

ERROR: jetson-image-custom-1.0-r0 do_rootfs: \
Unable to find package with name 'kernel-module-tegra-drm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc' \
in SPDX file /home/tschuster/project-lms/build/tmp/deploy/spdx/by-hash/jetson_agx_orin_orito/ \
sstate:nvidia-kernel-oot:jetson_agx_orin_orito-lms-linux:36.3.0:r0:jetson_agx_orin_orito:12: \
/nv-kernel-module-tegra-drm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc.spdx.json     

This error occurs for all conflicting modules.

Possible solution: By filling the RPROVIDES and RREPLACES variables of the oot modules, intree kernel modules of the same name are replaced and the above error is resolved.

nvidia-kernel-oot_36.3.0.bb:

-        d.appendVar('RPROVIDES:' + oot_pkg, ' ' + newprovides + ' ' + newprovides_virt)
+        newprovides_str = ' ' + newprovides + ' ' + newprovides_virt
+        d.appendVar('RPROVIDES:' + oot_pkg, newprovides_str)
+        d.appendVar('RREPLACES:' + oot_pkg, newprovides_str)
+        d.appendVar('RCONFLICTS:' + oot_pkg, newprovides_str)
madisongh commented 3 months ago

I can't reproduce this in master, and I think that's because there's a fix for dependency handling in create-spdx-2.2.bbclass there that isn't in scarthgap. Unfortunately, the fix depends on some bitbake changes that haven't made their way to scarthgap, so I can't easily test that hypothesis.

This appears only to affect the SPDX report generation, though. Package selection for rootfs construction works fine, because the OOT kernel modules have a higher version number (36.3.0) than the in-tree ones (5.15...), so they get selected. That said, it might be a good idea to add the RREPLACES/RCONFLICTS to the OOT module packages, although doing so for every module, when only 15 out of the 163 OOT modules actually conflict, seems like overkill.

There's the additional consideration of using other, newer, upstream kernels, that some folks may try to do. I'm not sure the out-of-tree versions should still be favored in that case, so we may also need to add PREFERRED_RPROVIDER settings for each of the conflicting modules as well, which could be overridden when needed.

So I'm not sure what the right answer is, especially when it looks like the issue is actually in the SPDX report generation code.

lms-ts commented 3 months ago

The error does not occur if the conflicting modules are explicitly added to the image, e. g. via MACHINE_ESSENTIAL_EXTRA_RDEPENDS += "nvidia-kernel-oot-alsa nvidia-kernel-oot-display. By removing the packages from the image it may be possible to reproduce the error in the master branch.

I think do_rootfs gets confused because it wants to install the conflicting kernel-module (explicitly added by IMAGE_INSTALL += "kernel-modules") and sees nvidia-kernel-oot as the provider because it set as PREFERRED_PROVIDER_virtual/dtb and has a higher version number. Then it fails because it was never explicitly specified that the oot version should be added to the image and is missing from the SPDX file like the error message suggests.

Another option could be to only set RCONFLICTS. I think that would solve the problem and other providers would be preferred if there is e. g. a version provided by the kernel.

Other question: I also want to use newer upstream kernels, but for newer kernels the oot modules do not build due to changes in kernel APIs. Are the oot modules supported for newer kernel version or would I have to manually patch the oot modules I need to use (especiallly nvpgu)?

madisongh commented 3 months ago

I think do_rootfs gets confused because it wants to install the conflicting kernel-module (explicitly added by IMAGE_INSTALL += "kernel-modules") and sees nvidia-kernel-oot as the provider because it set as PREFERRED_PROVIDER_virtual/dtb and has a higher version number. Then it fails because it was never explicitly specified that the oot version should be added to the image and is missing from the SPDX file like the error message suggests.

The SPDX generator is definitely confused. I'd be surprised if it was due to the virtual/dtb build-time provider preference, but maybe. I haven't studied the code enough, and it's rather complicated. If you don't care about building the SPDX files, you could remove create-spdx from INHERIT as a workaround.

Are the oot modules supported for newer kernel version or would I have to manually patch the oot modules I need to use (especiallly nvpgu)?

Yes, based on what I've seen in the Jetson Linux documentation (look for "Bring Your Own Kernel" there), at least for certain versions. I've seen some postings in the developer forum from folks that have run into some issues, though. @ichergui has been doing some work with this, I believe, and he may be able to provide more information.

lms-ts commented 3 months ago

The SPDX generator is definitely confused. I'd be surprised if it was due to the virtual/dtb build-time provider preference, but maybe. I haven't studied the code enough, and it's rather complicated. If you don't care about building the SPDX files, you could remove create-spdx from INHERIT as a workaround.

I do not think that virtual/dtb is the underlying problem but I think it would be nice to be able to disable the dtb part of the recipe via .bbappend. I for instance always provide my own devicetree recipe that inherits devicetree.bbclass and uses upstream devicetree sources.

Yes, I can work around it but as RCONFLICTS/RREPLACES is working, would adding it conditionally via a variable like e. g. PREFER_OOT_MODULES (default = "0") be an option to generally offer the possibility to preferably use the OOT modules if set?

madisongh commented 3 months ago

I do not think that virtual/dtb is the underlying problem but I think it would be nice to be able to disable the dtb part of the recipe via .bbappend. I for instance always provide my own devicetree recipe that inherits devicetree.bbclass and uses upstream devicetree sources.

The usual way to handle that is to have your recipe also set PROVIDES = "virtual/dtb", then set PREFERRED_PROVIDER_virtual/dtb to point to your recipe in your configuration. (Although I'd be surprised if the upstream device trees would be compatible with the NVIDIA out-of-tree drivers.)

madisongh commented 3 months ago

Yes, I can work around it but as RCONFLICTS/RREPLACES is working, would adding it conditionally via a variable like e. g. PREFER_OOT_MODULES (default = "0") be an option to generally offer the possibility to preferably use the OOT modules if set?

Try the latest on master or scarthgap... I've implemented this a bit more selectively, only adding the RCONFLICTS/RREPLACES for the small number of drivers that actually conflict, and adding PREFERRED_RPROVIDER settings to point to them. That should allow for selective overrides for individual drivers, if something like that is ever needed.

lms-ts commented 3 months ago

Try the latest on master or scarthgap... I've implemented this a bit more selectively, only adding the RCONFLICTS/RREPLACES for the small number of drivers that actually conflict, and adding PREFERRED_RPROVIDER settings to point to them. That should allow for selective overrides for individual drivers, if something like that is ever needed.

Yes, that works and it is a more verbose solution. Thanks.