Closed AntonioCayulao closed 3 months ago
[Solved] All the magic was in passing the arguments to the _install_driver() funtion:
_install_driver() {
local install_args=()
echo "Installing NVIDIA driver kernel modules..."
cd /usr/src/nvidia-${DRIVER_VERSION}
if [ -d /lib/modules/${KERNEL_VERSION}/kernel/drivers/video ]; then
rm -rf /lib/modules/${KERNEL_VERSION}/kernel/drivers/video
else
rm -rf /lib/modules/${KERNEL_VERSION}/video
fi
if [ "${ACCEPT_LICENSE}" = "yes" ]; then
install_args+=("--accept-license")
fi
nvidia-installer --module-signing-secret-key="${PRIVATE_KEY}" \
--module-signing-public-key=/drivers/kernel/pubkey.x509 \
--kernel-module-only --no-drm --ui=none --no-nouveau-check -m=${KERNEL_TYPE} ${install_args[@]+"${install_args[@]}"}
}
btw, I have the argument ACCEPT_LICENSE="".
Hi all,
I'm using a RKE2 cluster with GPU-Operator but compiling the nvidia-driver source code to use the driver version 535.161.07, Cuda version 12.3.2 with ubuntu 24.04 and the secure boot feature enabled.
In the past I was able to create and have it up and running the container with all that conditions with the exception of the secure boot feature enabled.
I enrolled the keys on the Bios and I'm passing one of those to sign the nvidia module.
I removed donkey step, so I can pass to the script the keys directly and I have replaced the sign-file binary for the kmodsign and I added the nvidia.ko to sign it directly (just in case):
Afterwards, once the container is running the logs show me that the modules were signed.
But I have this error on the compilation process when nvidia try to load the modules:
In addition I added the options to the NVIDIA-Linux-\$DRIVER_ARCH-\$DRIVER_VERSION.run and nvidia-installer script:
Following the documentation here:
I attached the full logs from /var/log/nvidia-installer.log
I hope you can help me to solve this issue :disappointed:
Have a good day,
Antonio.
nvidia-installer.log