NVIDIA / gpu-driver-container

The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.
Apache License 2.0
64 stars 33 forks source link

Add Support For Applying Patches #25

Open fifofonix opened 5 months ago

fifofonix commented 5 months ago

For OS versions without official support, e.g. FedoraCoreOS, it may sometimes (^1) be necessary to apply an NVIDIA-forum-recommended patch(es) to overcome compilation issues to deliver a working environment. The current driver container does not expose a means to easily apply patches via the installer's --apply-patch switch.

Having a means toinit the driver-container with an --apply-patches switch that searched a specified directory for applicable patches would be a convenience feature for teams running non-supported OS versions. In the situation where the driver container is running on the host one could imagine mounting patches to the container via systemd units etc. However, this method might not extend easily to a gpu-operator scheduled driver container (whether there exists an injection point for a) additional command switches, b) additional config files into the k8s/gpu-operator scheduled driver pods needs investigation).

^1 For at least a month it has been known that driver compilation for Fedora40 (which was in beta at the time, but is now released) fails affecting anyone running environments based on FedoraCoreOS's next and now testing streams. It is expected that in two weeks unless a formal new driver version is released incorporating the patch, that these failures will migrate to the FedoraCoreOS stable stream.

cdesiniotis commented 5 months ago

@fifofonix thanks for raising this RFE. I agree that it would be helpful if we exposed an interface for passing additional command line options to the nvidia installer. PRs are always welcome if you are interested in contributing this.

Concerning your second request -- mounting arbitrary config files to the GPU Operator managed driver container -- we would need to evaluate further. cc @shivamerla @tariq1890

fifofonix commented 4 months ago

Note I have a work-in-progress pull request for fedora that allows a single patch to be applied to the driver container. With this feature it is possible to get a working fedora40 driver container by applying a specific patch referenced originally on the NVIDIA driver forum. When I say work-in-progress pull request, I mean I have completed a merge request on GitLab in the referenced project.

https://gitlab.com/container-toolkit-fcos/driver/-/issues/11