Closed lengrongfu closed 3 weeks ago
@cdesiniotis Have you seen this error?
@lengrongfu I am not familiar. It is recommended to blacklist nouveau as it can conflict with the nvidia driver.
I am using gpu-operator to install the driver. Do I need to manually add nouveau to the blacklist before installing gpu-operator?
k8s-driver-manager pod exec rmmod nouveau
error.
https://github.com/NVIDIA/k8s-driver-manager/blob/659892aea6af4442e6e63b8a97cadc838c84782c/driver-manager#L494
I am using gpu-operator to install the driver. Do I need to manually add nouveau to the blacklist before installing gpu-operator?
This is not a required pre-requisite, but because you are seeing errors from nouveau I recommended that you try blacklisting it. Like you pointing out, we do take care of unloaded in the module.
Ok, thanks, i exec blacklist nouveau after, k8s-driver-manager
can exec success,
@cdesiniotis Let's discuss whether it is possible to develop a new feature to add an option to k8s-driver-manager to perform the operation of blacklist nouveau
Since blacklisting would require updating the initramfs and rebooting the node, it is not something we would be open to adding to this component. This should be done during infrastructure provisioning.
I use
gpu-operator:v23.9.0
to install nvidia gpu driver, butnvidia-driver-daemonset
pod start after, the machine will kernel crash.I use GPU car is
Tesla P4
.os info:
Red Hat9.2
, kernel version is5.14.0-284.11.1.el9_2.x86_64
.machine is install
nouveau
driver, and i usedmesg
command to look kernel log, found having many error aboutnouveau
: