NVIDIA / ansible-role-nvidia-docker

35 stars 21 forks source link

RHEL8.4: Missing dependency [nvidia-container-toolkit] #16

Open shemi-plgs opened 2 years ago

shemi-plgs commented 2 years ago

When installing the nvidia-container-runtime with this rôle, i stilled had an issue and couldn't launch any GPU tasks, having the error: "Error response from daemon: OCI runtime create failed"

I had used the ansible role on Ubuntu and it worked fine, but on RHEL8.4, i was always having an error after install

After investigating, i found than on Ubuntu, the installation of the nvidia-container-runtime package comes with the nvidia-container-toolkit dependency, however on RHEL is does not. It is this executable that is used by container runtime platforms to initiate GPU tasks

This dependency is also a dependency of the nvidia-docker2 package, but in your rôle you only get the script.

I was able to make everything work by installing the missing nvidia-container-toolkit with yum

Is this missing dependency on RedHat platforms normal ?

ajdecon commented 2 years ago

Hi there!

RHEL 8.x is not a supported or tested platform for this role. Please see: https://github.com/NVIDIA/ansible-role-nvidia-docker/blob/c5cb5cbfec7739f4ac2c0a4c9737202662e2ea04/meta/main.yml

That said, we would probably be open to a PR to add the necessary logic for this support.