Closed skwerlman closed 7 years ago
It looks like the user from where netdata/python.d starts does not have the permissions to initiate the nvidia nvml library: NVML_ERROR_NO_PERMISSION if the user doesn’t have permission to talk to any device
What System is this running on? Can you run the debug netdata python.d from your user account and see what happens? For me under Ubuntu this would be calling:
/usr/libexec/netdata/plugins.d/python.d.plugin debug nv
Netdata is running on my desktop box, with Sabayon 16.11 installed.
when I run /usr/libexec/netdata/plugins.d/python.d.plugin debug nv
from my user account it seems to run correctly, but doesn't show up in netrdata.
Some of the output:
2016-12-15 09:21:29: python.d DEBUG: nv nv updated in 1 ms
2016-12-15 09:21:29: python.d DEBUG: nv sleeping for 0.997231960297 secs to reach frequency of 1.0 secs, now: 1481811689.0 next: 1481811690.0 penalty: 0
2016-12-15 09:21:30: python.d DEBUG: nv Not Supported
2016-12-15 09:21:30: python.d DEBUG: nv Device 0 : GeForce GTX 980
2016-12-15 09:21:30: python.d DEBUG: nv Brand: GeForce
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Temp : 55
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem total : 4230414336 bytes
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem used : 542900224 bytes
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem free : 3687514112 bytes
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Load GPU : 0 %
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Load MEM : 6 %
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Core clock: 135 MHz
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 SM clock : 135 MHz
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Mem clock : 324 MHz
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 Fan speed : 1 %
2016-12-15 09:21:30: python.d DEBUG: nv GeForce GTX 980 ECC errors: None
BEGIN nv.load 1000015
SET device_load_gpu_0 = 0
SET device_load_mem_0 = 6
END
BEGIN nv.memory 1000015
SET device_mem_used_0 = 542900224
SET device_mem_free_0 = 3687514112
END
BEGIN nv.frequency 1000015
SET device_core_clock_0 = 135
SET device_mem_clock_0 = 324
SET device_sm_clock_0 = 135
END
BEGIN nv.temperature 1000015
SET device_temp_0 = 55
END
BEGIN nv.fan 1000015
SET device_fanspeed_0 = 1
END
BEGIN netdata.plugin_pythond_nv 1000015
SET run_time = 1
END
Running it via sudo -u netdata
gives me the insufficient permissions error, so I suspect the netdata user needs to be added to some group, but I have no idea which.
Okay, so now we at least know that it can work, we just have to find a fix for the permissions.
As which user is netdata running?
It's hard for me to troubleshoot since I don't have a Sabayon or other gentoo based linux installed anywhere and if your netdata service runs with default settings with user netdata
, it's configured the same way I run it on my Ubuntu installation.
netdata is running as user netdata
maybe the permissions on the nvidia library is wrong?!
Mine are:
~ $ ls -la /usr/lib/libnvidia-ml.so.1
lrwxrwxrwx 1 root root 22 23. Okt 2015 /usr/lib/libnvidia-ml.so.1 -> libnvidia-ml.so.352.39
~ $ ls -la /usr/lib/libnvidia-ml.so.352.39
-rwxr-xr-x 1 root root 958K 23. Okt 2015 /usr/lib/libnvidia-ml.so.352.39
┌[skw☮sby-main]-(~)
└> ls -l /usr/lib/libnvidia-ml.so*
lrwxrwxrwx 1 root root 22 Nov 14 05:43 /usr/lib/libnvidia-ml.so -> libnvidia-ml.so.370.28
lrwxrwxrwx 1 root root 22 Nov 14 05:43 /usr/lib/libnvidia-ml.so.1 -> libnvidia-ml.so.370.28
-rwxr-xr-x 1 root root 1156288 Nov 2 18:20 /usr/lib/libnvidia-ml.so.370.28
I figured it out! I added netdata
to the video
group, and the plugin is working now. Thanks for helping narrow down the issue!
Awesome. Well done. Have fun with it :)
After setting up the plugin according to the readme, I am left with this error: