bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.78k stars 519 forks source link

Unable to use lockdown mode with NVIDIA module on bottlerocket #4218

Open db376 opened 1 month ago

db376 commented 1 month ago

When attempting to load the NVIDIA kernel module on a Bottlerocket AMI using kernel lockdown = integrity, errors like the following are produced:

5.879228] driverdog[2088]: 23:10:41 [ERROR] '/usr/bin/modprobe' failed - stderr: modprobe: ERROR: could not insert 'nvidia': Operation not permitted
[    5.881035] driverdog[2088]: modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
[    5.882187] driverdog[2088]: modprobe: ERROR: could not insert 'nvidia_modeset': Operation not permitted
[FAILED] Failed to start Load additional kernel modules.
See 'systemctl status load-kernel-modules.service' for details.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.

This was tested on the following versions in us-east-1: 1.30 (ami-0c2f741e432159b2c), 1.29 (ami-06033e6f46c64c7db), and 1.20 (ami-046b028e6b00a3938).

Sef-signing also does not work as a workaround - rather, we receive validation rejected.

bcressey commented 1 month ago

There are two factors at work here:

  1. the NVIDIA kmods aren't linked until runtime (for software licensing reasons) and can't be signed by the same ephemeral key used to sign the kernel's own modules (for policy reasons)
  2. for no very good reason, there's no mechanism in the build system today to deal with signing kmods - at all, really, we just rely on the kernel to do its own signing with a throwaway key

I've been considering ways to address (2) recently since we have two newly added external kmods - the Neuron driver and the NVIDIA open source driver - that really ought to be signed and trusted.