coraxx / netdata_nv_plugin

NetData plugin for Nvidia GPU stats
87 stars 14 forks source link

pynvml could not be intialized: insufficient permissions #1

Closed skwerlman closed 7 years ago

skwerlman commented 7 years ago

After setting up the plugin according to the readme, I am left with this error:

2016-12-14 10:19:34: python.d INFO: nv 'nvMemFactor' set to: 1
2016-12-14 10:19:34: python.d ERROR: nv pynvml could not be initialized Insufficient Permissions
2016-12-14 10:19:34: python.d ERROR: nv check() failed - disabling job
2016-12-14 10:19:34: python.d ERROR: DISABLED: nv/None
coraxx commented 7 years ago

It looks like the user from where netdata/python.d starts does not have the permissions to initiate the nvidia nvml library: NVML_ERROR_NO_PERMISSION if the user doesn’t have permission to talk to any device

What System is this running on? Can you run the debug netdata python.d from your user account and see what happens? For me under Ubuntu this would be calling:

/usr/libexec/netdata/plugins.d/python.d.plugin debug nv

skwerlman commented 7 years ago

Netdata is running on my desktop box, with Sabayon 16.11 installed.

when I run /usr/libexec/netdata/plugins.d/python.d.plugin debug nv from my user account it seems to run correctly, but doesn't show up in netrdata.

Some of the output:

2016-12-15 09:21:29: python.d DEBUG: nv nv updated in 1 ms
2016-12-15 09:21:29: python.d DEBUG: nv sleeping for 0.997231960297 secs to reach frequency of 1.0 secs, now: 1481811689.0  next: 1481811690.0  penalty: 0
2016-12-15 09:21:30: python.d DEBUG: nv Not Supported
2016-12-15 09:21:30: python.d DEBUG: nv Device 0 : GeForce GTX 980
2016-12-15 09:21:30: python.d DEBUG: nv Brand: GeForce
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Temp      : 55
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem total : 4230414336 bytes
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem used  : 542900224 bytes
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem free  : 3687514112 bytes
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Load GPU  : 0 %
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Load MEM  : 6 %
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Core clock: 135 MHz
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 SM clock  : 135 MHz
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem clock : 324 MHz
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Fan speed : 1 %
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 ECC errors: None
BEGIN nv.load 1000015
SET device_load_gpu_0 = 0
SET device_load_mem_0 = 6
END 
BEGIN nv.memory 1000015
SET device_mem_used_0 = 542900224
SET device_mem_free_0 = 3687514112
END 
BEGIN nv.frequency 1000015
SET device_core_clock_0 = 135
SET device_mem_clock_0 = 324
SET device_sm_clock_0 = 135
END 
BEGIN nv.temperature 1000015
SET device_temp_0 = 55
END 
BEGIN nv.fan 1000015
SET device_fanspeed_0 = 1
END 

BEGIN netdata.plugin_pythond_nv 1000015
SET run_time = 1
END
skwerlman commented 7 years ago

Running it via sudo -u netdata gives me the insufficient permissions error, so I suspect the netdata user needs to be added to some group, but I have no idea which.

coraxx commented 7 years ago

Okay, so now we at least know that it can work, we just have to find a fix for the permissions.

As which user is netdata running?

It's hard for me to troubleshoot since I don't have a Sabayon or other gentoo based linux installed anywhere and if your netdata service runs with default settings with user netdata, it's configured the same way I run it on my Ubuntu installation.

skwerlman commented 7 years ago

netdata is running as user netdata

coraxx commented 7 years ago

maybe the permissions on the nvidia library is wrong?!

Mine are:

~ $ ls -la /usr/lib/libnvidia-ml.so.1
lrwxrwxrwx 1 root root 22 23. Okt 2015  /usr/lib/libnvidia-ml.so.1 -> libnvidia-ml.so.352.39
~ $ ls -la /usr/lib/libnvidia-ml.so.352.39
-rwxr-xr-x 1 root root 958K 23. Okt 2015  /usr/lib/libnvidia-ml.so.352.39
skwerlman commented 7 years ago
┌[skw☮sby-main]-(~)
└> ls -l /usr/lib/libnvidia-ml.so*
lrwxrwxrwx 1 root root      22 Nov 14 05:43 /usr/lib/libnvidia-ml.so -> libnvidia-ml.so.370.28
lrwxrwxrwx 1 root root      22 Nov 14 05:43 /usr/lib/libnvidia-ml.so.1 -> libnvidia-ml.so.370.28
-rwxr-xr-x 1 root root 1156288 Nov  2 18:20 /usr/lib/libnvidia-ml.so.370.28
skwerlman commented 7 years ago

I figured it out! I added netdata to the video group, and the plugin is working now. Thanks for helping narrow down the issue!

coraxx commented 7 years ago

Awesome. Well done. Have fun with it :)