NVIDIA / nvidia-persistenced

NVIDIA driver persistence daemon
MIT License
55 stars 13 forks source link

persistenced requires root on GH200 #11

Open sclarkson opened 2 months ago

sclarkson commented 2 months ago
Aug 27 21:35:25 ubuntu nvidia-persistenced[1405]: Started (1405)
Aug 27 21:35:25 ubuntu nvidia-persistenced[1405]: NUMA: Enabling NUMA memory Auto-Online due to GPU requirement
Aug 27 21:35:25 ubuntu nvidia-persistenced[1405]: NUMA: Failed to open /sys/devices/system/memory/auto_online_blocks: Permission denied
Aug 27 21:35:25 ubuntu nvidia-persistenced[1405]: NUMA: Failed to enable NUMA memory Auto-Online
Aug 27 21:35:25 ubuntu nvidia-persistenced[1400]: nvidia-persistenced failed to initialize. Check syslog for more details.
Aug 27 21:35:25 ubuntu nvidia-persistenced[1405]: PID file unlocked.
Aug 27 21:35:25 ubuntu systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
Aug 27 21:35:25 ubuntu nvidia-persistenced[1405]: PID file closed.

Current documentation says it does not need root, as well as the install script defaulting to a non-root user.

Could we get some clarity on whether it's intended to be mandatory to run as root, here? Especially since auto_online_blocks is already properly set in my case, and persistenced is trying to write it unconditionally.

hv15 commented 1 month ago

We've run into this issue as well for our installation.

We workaround this by changing the service file to:

[Unit]
Description=NVIDIA Persistence Daemon (Hopper)

[Service]
Type=forking
ProtectSystem=strict
ReadWritePaths=/var/run/nvidia-persistenced
PIDFile=/run/nvidia-persistenced/nvidia-persistenced.pid
ExecStartPre=/usr/bin/nvidia-smi
ExecStart=/usr/bin/nvidia-persistenced -V
TimeoutSec=300

[Install]
WantedBy=multi-user.target