ComputeCanada / puppet-magic_castle

Puppet Environment repo for Magic Castle - https://github.com/ComputeCanada/magic_castle
MIT License
13 stars 21 forks source link

Use dynamic user for nvidia-persistenced service #383

Closed etiennedub closed 1 month ago

etiennedub commented 1 month ago

With the latest nvidia driver (>=560) the nvidia-persistenced user is not created anymore. This user is only required for the nvidia-persistenced.service. This PR remove the user need by using a dynamic user in the service.

cmd-ntrf commented 1 month ago

For a certain range of devices, nvidia-persistenced will try to write to /sys/devices/system/memory/auto_online_blocks, and it will fail if the service is not running as root. This was reported here: https://github.com/NVIDIA/nvidia-persistenced/issues/11.

I am not sure we will see this kind of devices when running Magic Castle in a near future. So, we will keep using the dynamic user, but I am leaving this note in case nvidia-persistenced fails to start with the following error:

NUMA: Failed to enable NUMA memory Auto-Online