Closed MaxMatti closed 3 years ago
I should've mentioned running nvidia-smi
works:
Sat Feb 27 19:28:45 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... Off | 00000000:0A:00.0 On | N/A |
| 57% 30C P8 3W / 215W | 377MiB / 7979MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 971 G /usr/lib/Xorg 231MiB |
| 0 N/A N/A 1704 G /usr/bin/kwin_x11 2MiB |
| 0 N/A N/A 1773 G /usr/bin/plasmashell 90MiB |
| 0 N/A N/A 1953 G /usr/bin/nextcloud 8MiB |
| 0 N/A N/A 2088 G ...akonadi_archivemail_agent 2MiB |
| 0 N/A N/A 2106 G .../akonadi_mailfilter_agent 2MiB |
| 0 N/A N/A 2111 G ...n/akonadi_sendlater_agent 2MiB |
| 0 N/A N/A 2112 G ...nadi_unifiedmailbox_agent 2MiB |
| 0 N/A N/A 16959 G ...AAAAAAAA== --shared-files 25MiB |
+-----------------------------------------------------------------------------+
Hi, sadly I currently do not have a NVIDIA card in a Linux system. But I just noticed something? I just checked my NetData installation on one of my Debian machines and the paths differ now from the README here. python.d e.g. is now under '/usr/lib/netdata/conf.d/python.d' (conf.d added).
Also it seems that python modules live somewhere else now: /usr/libexec/netdata/python.d/python_modules/
If that makes it work, I have to update the README. But you would be a big help if you could try out to copy it to the appropriate places and try again :)
I think I'm confusing myself now with the paths: These are the differences from your installation and the usual paths:
Readme: /usr/libexec/netdata/python.d/ /usr/libexec/netdata/python.d/python_modules/
yours: /usr/lib/netdata/python.d /usr/lib/netdata/python.d/python_modules
Can recheck? Maybe the scripts just ended up in the wrong folders :)
Yes, I did modify the paths, because there's no /usr/libexec/netdata
in my filesystem. Should I create that folder instead?
I checked by running this:
$ sudo find / -name "python.d" 2>/dev/null
/etc/netdata/python.d
/usr/lib/netdata/conf.d/python.d
/usr/lib/netdata/python.d
Maybe a debug run can tell us more.
Try:
/usr/lib/netdata/plugins.d/python.d.plugin nv debug trace
Seems to me like the 2070 SUPER is somewhere marked as not supported?
$ /usr/lib/netdata/plugins.d/python.d.plugin nv debug trace
2021-03-02 00:01:14: python.d INFO: plugin[main] : using python v3
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : looking for 'python.d.conf' in ['/etc/netdata', '/usr/lib/netdata/conf.d']
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : loading '/usr/lib/netdata/conf.d/python.d.conf'
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : '/usr/lib/netdata/conf.d/python.d.conf' is loaded
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : looking for 'pythond-jobs-statuses.json' in /var/lib/netdata
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : loading '/var/lib/netdata/pythond-jobs-statuses.json'
2021-03-02 00:01:14: python.d WARNING: plugin[main] : error on loading '/var/lib/netdata/pythond-jobs-statuses.json' : PermissionError(13, 'Permission denied')
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : [nv] looking for 'nv.conf' in ['/etc/netdata/python.d', '/usr/lib/netdata/conf.d/python.d']
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : [nv] loading '/etc/netdata/python.d/nv.conf'
2021-03-02 00:01:14: python.d DEBUG: plugin[main] : [nv] '/etc/netdata/python.d/nv.conf' is loaded
2021-03-02 00:01:14: python.d INFO: plugin[main] : [nv] built 1 job(s) configs
2021-03-02 00:01:14: python.d INFO: nv[nv] : 'nvMemFactor' set to: 1
2021-03-02 00:01:14: python.d INFO: nv[nv] : Nvidia Driver Version: b'460.56'
2021-03-02 00:01:14: python.d DEBUG: nv[nv] : Unit count: 0
2021-03-02 00:01:14: python.d DEBUG: nv[nv] : Device count 1
2021-03-02 00:01:14: python.d DEBUG: nv[nv] : Not Supported
2021-03-02 00:01:14: python.d DEBUG: nv[nv] : Device 0 : b'GeForce RTX 2070 SUPER'
2021-03-02 00:01:14: python.d WARNING: plugin[main] : nv[nv] : unhandled exception on check : IndexError('list index out of range'), skipping the job
2021-03-02 00:01:14: python.d INFO: plugin[main] : no jobs to serve
2021-03-02 00:01:14: python.d INFO: plugin[main] : exiting from main...
$ ls -hal /var/lib/netdata/pythond-jobs-statuses.json
-rw-rw---- 1 netdata netdata 46 2. Mär 00:00 /var/lib/netdata/pythond-jobs-statuses.json
$ sudo cat /var/lib/netdata/pythond-jobs-statuses.json
{
"sensors": {
"sensors": "active"
}
}
Can you please comment out or delete line 342 self.debug("Brand:", str(brands[brand]))
in nv.chart.py and try again?
Seems like I should've waited a few minutes for a response before going to bed...
$ /usr/lib/netdata/plugins.d/python.d.plugin nv debug trace
2021-03-02 15:58:02: python.d INFO: plugin[main] : using python v3
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : looking for 'python.d.conf' in ['/etc/netdata', '/usr/lib/netdata/conf.d']
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : loading '/usr/lib/netdata/conf.d/python.d.conf'
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : '/usr/lib/netdata/conf.d/python.d.conf' is loaded
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : looking for 'pythond-jobs-statuses.json' in /var/lib/netdata
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : loading '/var/lib/netdata/pythond-jobs-statuses.json'
2021-03-02 15:58:02: python.d WARNING: plugin[main] : error on loading '/var/lib/netdata/pythond-jobs-statuses.json' : PermissionError(13, 'Permission denied')
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : [nv] looking for 'nv.conf' in ['/etc/netdata/python.d', '/usr/lib/netdata/conf.d/python.d']
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : [nv] loading '/etc/netdata/python.d/nv.conf'
2021-03-02 15:58:02: python.d DEBUG: plugin[main] : [nv] '/etc/netdata/python.d/nv.conf' is loaded
2021-03-02 15:58:02: python.d INFO: plugin[main] : [nv] built 1 job(s) configs
2021-03-02 15:58:02: python.d INFO: nv[nv] : 'nvMemFactor' set to: 1
2021-03-02 15:58:02: python.d INFO: nv[nv] : Nvidia Driver Version: b'460.56'
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : Unit count: 0
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : Device count 1
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : Not Supported
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : Device 0 : b'GeForce RTX 2070 SUPER'
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Temp : 37
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Mem total : 8366784512 bytes
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Mem used : 669581312 bytes
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Mem free : 7697203200 bytes
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Load GPU : 1 %
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Load MEM : 10 %
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Load ENC : 0 %
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Load DEC : 0 %
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Core clock: 300 MHz
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' SM clock : 300 MHz
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Mem clock : 405 MHz
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' Fan speed : 58 %
2021-03-02 15:58:02: python.d DEBUG: nv[nv] : b'GeForce RTX 2070 SUPER' ECC errors: None
2021-03-02 15:58:02: python.d INFO: nv[nv] : Graphics Card(s) found: b'GeForce RTX 2070 SUPER' [0]
2021-03-02 15:58:02: python.d INFO: plugin[main] : nv[nv] : check success
2021-03-02 15:58:02: python.d WARNING: plugin[main] : nv[nv] : registration failed: [Errno 13] Permission denied: '/var/lib/netdata/lock/nv.collector.lock', skipping the job
2021-03-02 15:58:02: python.d INFO: plugin[main] : no jobs to serve
2021-03-02 15:58:02: python.d INFO: plugin[main] : exiting from main...
Okay looks good :D the permission error I guess has something to do with access permissions (netdata running as different user than the debug run). Have you tried to restart netdata to take a look if it is showing up now?
It does show up, sorry for the delayed reply. Also a cups-section showed up that previously didn't:
Thank you very much for your help!
Wonderful news. I will update the repo with the fix soon. The cups thing I have no idea yet. What metrics does it show? ^^
It just shoes some metrics related to printers, so not really relevant to me. Knowing nothing about netdata I suspect it's being run after nvidia and thus wasn't executed previously because that thread never got so far.
Not sure if I'm doing anything wrong or if this is an issue with netdata or with this plugin or if my card (2070S) is not supported, but I tried installing this plugin and there is no new chart section showing up in netdata - I expected a new section that contains the GPU temperature, fanspeed, etc.
My system:
Ryzen 3700X on B450 and a RTX 2070 SUPER
CPU: AMD Ryzen 3700X RAM: 32 GB DDR4-3600 Motherboard: ASRock Fatal1ty B450 Gaming-ITX/AC AMD B450 8GB Zotac Gaming GeForce RTX 2070 SUPER AMP, GDDR6, HDMI, 3x DP (ZT-T20710D-10P) (in case that's relevant)Archlinux (last update and reboot about 2h before opening this issue) Netdata v1.29.3
What I did:
Chart list after running above commands and refreshing netdata:
I then thought I had to enable plugins manually:
I appended this snippet to the config:
Then I reloaded again but still didn't find any new chart section.
Searching for "nvidia" or "temperature" also did not lead me to any sections that weren't previously there.