Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
139 stars 23 forks source link

unsupported operand type(s) for *: 'float' and 'NoneType' in GPUmodule.py #127

Closed Marjorie-R closed 10 months ago

Marjorie-R commented 2 years ago

I have a new Ryzen 5 5600G with integrated graphics and a NVIDIA GT710B graphics card. I'm running Devuan 4 (based on Debian 10) with Linux Kernel 5.10.0-13-amd64 and Python 3.9.2

Having previously fixed the 0xfffd7fff conversion error in the Debian/Devuan repository version I then hit the following error in GPUmodule.py when running gpu-ls (also gps-mon). I get the same error when I've updated to the current https://debian.rickslab.com/gpu-utils repository.

#gpu-mon
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Detected GPUs: NVIDIA: 1, AMD: 1
AMD: Wattman features enabled: 0xfffd7fff
Traceback (most recent call last):
  File "/usr/bin/gpu-mon", line 409, in <module>
    main()
  File "/usr/bin/gpu-mon", line 329, in main
    gpu_list.read_gpu_sensor_set(data_type=Gpu.GpuItem.SensorSet.All)
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2174, in read_gpu_sensor_set
    gpu.read_gpu_sensor_set(data_type)
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1286, in read_gpu_sensor_set
    return self.read_gpu_sensor_set_nv(data_type)
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1352, in read_gpu_sensor_set_nv
    self.set_params_value('power', power)
  File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 580, in set_params_value
    self.energy['cumulative'] += delta_hrs * value / 1000
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'
Ricks-Lab commented 2 years ago

Can you post the log file contents when running with the --debug option?

Ricks-Lab commented 2 years ago

I pushed a version that should handle invalid power values read from the gpu, but still need logger details to understand why it is not getting a valid value.

Marjorie-R commented 2 years ago

Hi Rick,

On Fri, 2022-04-29 at 02:53 -0700, Rick wrote:

I pushed a version that should handle invalid power values read from the gpu, but still need logger details to understand why it is not getting a valid value. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

I get the following output:

$ gpu-ls --debug Devuan: Unverified OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Traceback (most recent call last): File "/usr/bin/gpu-ls", line 154, in main() File "/usr/bin/gpu-ls", line 102, in main gpu_list.set_gpu_list(clinfo_flag=True) File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1906, in set_gpu_list pp_od_file_details = file_ptr.read() OSError: [Errno 88] Socket operation on non-socket

I can't find a log file as such.

May be relevant:

1) Devuan doesn't use systemd (I'm use sysvinit). For some purposes systemd uses sockets and sysvinit doesn't. 

2) I'm having issues with my two GPU in that if I set either as the primary monitor (in BIOS) that GPU works but the other doesn't.  I can't seem to get them both working together in a dual monitor mode.  Possibly that one that isn't working is triggering the error you see.  If so removing the NVIDIA card should avoid the error. I can try this if it helps, but I assume that you'd want to trap the error in your code anyway.

--  Marjorie 

Ricks-Lab commented 2 years ago

I pushed an update that checks for OSError when reading the pp_od_clk_voltage file. It was already being checked for all other sensors. I can not verify on my system as an OSError is not raised, even if I read a regular file. To test it out, you will need to clone the repository and run a repository install Thanks!

Marjorie-R commented 2 years ago

Hi Rick,

On Fri, 2022-04-29 at 16:21 -0700, Rick wrote:

I pushed an update that checks for OSError when reading the pp_od_clk_voltage file. It was already being checked for all other sensors. I can not verify on my system as an OSError is not raised, even if I read a regular file. To test it out, you will need to clone the repository and run a repository install Thanks! — I have done a Repository installation and re-ran the gpu-ls command.

This is with the AMD GPU as primary and the NVIDIA card seemingly disabled.

More information but still get the OSError.

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff Error: Invalid power value read [None] Warning: Can not read parameter: mem_loading, disabling for this GPU: 0 Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0 Warning: Can not read parameter: power_cap, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_enable, disabling for this GPU: 0 Warning: Can not read parameter: fan_target, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed, disabling for this GPU: 0 Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0 2 total GPUs, 1 rw, 1 r-only, 0 w-only

Traceback (most recent call last): File "/home/marjorie/gpu-utils/./gpu-ls", line 154, in main() File "/home/marjorie/gpu-utils/./gpu-ls", line 138, in main gpu_list.read_gpu_pstates() File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 2171, in read_gpu_pstates gpu.read_gpu_pstates() File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 1077, in read_gpu_pstates for line in card_file: OSError: [Errno 88] Socket operation on non-socket

I then rebooted and changed the Primary GPU to the NVIDIA card in the BIOS before the OS loaded. This time the program ran without error, though the AMD card was (correctly) shown as not working.

@.***:~/gpu-utils$ ./gpu-ls OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff Error: Invalid power value read [None] 2 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0 Vendor: NVIDIA Readable: True Writable: False Compute: False GPU UID: GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa GPU S/N: [N/A] Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] Card Model: GeForce GT 710 Display Card Model: GeForce GT 710 Card Index: 0 PCIe ID: 10:00.0 Link Speed: GEN[N/A] Link Width: [N/A] ################################################## Driver: 460.91.03 vBIOS Version: 80.28.A6.00.15 Compute Platform: None Compute Mode: Default GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ################################################## Current Power (W): None Power Cap (W): [N/A] Power Cap Range (W): ['[N/A]', '[N/A]'] Fan Target Speed (rpm): None Current Fan PWM (%): 40.000 ################################################## Current GPU Loading (%): [N/A] Current Memory Loading (%): [N/A] Current VRAM Usage (%): 50.975 Current VRAM Used (GB): 0.996 Total VRAM (GB): 1.954 Current Temps (C): {'temperature.gpu': 39.0, 'temperature.memory': None} Current Clk Frequencies (MHz): {'clocks.gr': None, 'clocks.mem': None, 'clocks.sm': None, 'clocks.video': None} Maximum Clk Frequencies (MHz): {'clocks.max.gr': None, 'clocks.max.mem': None, 'clocks.max.sm': None} Current SCLK P-State: [0, ''] Power Profile Mode: [N/A]

Card Number: None Vendor: AMD Readable: False Writable: False Compute: False Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) PCIe ID: 30:00.0 Driver: amdgpu GPU Type: Unsupported HWmon: None Card Path: None System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0

If I run ./gpu-mon I get:

┌─────────────┬────────────────────┐ │Card # │card0 │ ├─────────────┼────────────────────┤ │Model │ GeForce GT 710 │ │GPU Load % │[N/A] │ │Mem Load % │[N/A] │ │VRAM Usage % │23.538 │ │GTT Usage % │None │ │Power (W) │None │ │Power Cap (W)│[N/A] │ │Energy (kWh) │0.0 │ │T (C) │43.0 │ │VddGFX (mV) │nan │ │Fan Spd (%) │40.0 │ │Sclk (MHz) Traceback (most recent call last): File "/home/marjorie/gpu-utils/./gpu-mon", line 409, in main() File "/home/marjorie/gpu-utils/./gpu-mon", line 398, in main com_gpu_list.print_table() File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 2232, in print_table data_value_raw = gpu.get_params_value(table_item) File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 700, in get_params_value return int(self.prm['frequencies'][clock_name]) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Note 1: the nvidia card is fanless. Note 2: this is a new motherboard as well as CPU/GPU. The motherboard is a MSI MPG B550 GAMING PLUS. 

NB. I'm going to be away tomorrow through Friday so unable to do any testing meanwhile. --  Marjorie

Ricks-Lab commented 2 years ago

I found that I was not not catching OSError for the reading of p-states or the PPM table. I have pushed an update.

Marjorie-R commented 2 years ago

Hi Rick,

On Sat, 2022-04-30 at 17:49 -0700, Rick wrote:

I found that I was not not catching OSError for the reading of p- states or the PPM table. I have pushed an update. Sorry for the delay in testing this - I've been away at a Scottish Country Dance week.

Please find attached two outputs files for ./gps --debug, I re-cloned from github before I did these runs:

1) With NVIDIA GPU as primary output device and AMD GPU (on CPU die) disabled in BIOS. This produces a full output without an error message with the -- debug option.

2) With AMD GPU (on CPU die) as primary output device and NVIDIA GPU disabled in BIOS.  This produces an error traceback with the -- debug option and a fuller output and no error message without.

--  Marjorie

@.***:~/gpu-utils$ ./gpu-ls --debug Devuan: Unverified OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff Error: Invalid power value read [None] 2 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0 Vendor: NVIDIA Readable: True Writable: False Compute: False GPU UID: GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa GPU S/N: [N/A] Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] Card Model: GeForce GT 710 Display Card Model: GeForce GT 710 Card Index: 0 PCIe ID: 10:00.0 Link Speed: GEN[N/A] Link Width: [N/A] ################################################## Driver: 460.91.03 vBIOS Version: 80.28.A6.00.15 Compute Platform: None Compute Mode: Default GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ################################################## Current Power (W): None Power Cap (W): [N/A] Power Cap Range (W): ['[N/A]', '[N/A]'] Fan Target Speed (rpm): None Current Fan PWM (%): 40.000 ################################################## Current GPU Loading (%): [N/A] Current Memory Loading (%): [N/A] Current VRAM Usage (%): 54.123 Current VRAM Used (GB): 1.058 Total VRAM (GB): 1.954 Current Temps (C): {'temperature.gpu': 41.0, 'temperature.memory': None} Current Clk Frequencies (MHz): {'clocks.gr': None, 'clocks.mem': None, 'clocks.sm': None, 'clocks.video': None} Maximum Clk Frequencies (MHz): {'clocks.max.gr': None, 'clocks.max.mem': None, 'clocks.max.sm': None} Current SCLK P-State: [8, ''] Power Profile Mode: [N/A]

Card Number: None Vendor: AMD Readable: False Writable: False Compute: False Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) PCIe ID: 30:00.0 Driver: amdgpu GPU Type: Unsupported HWmon: None Card Path: None System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls --debug Devuan: Unverified OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: System support issue for GPU [30:00.0] Traceback (most recent call last): File "/home/marjorie/gpu-utils/./gpu-ls", line 154, in main() File "/home/marjorie/gpu-utils/./gpu-ls", line 102, in main gpu_list.set_gpu_list(clinfo_flag=True) File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 1937, in set_gpu_list LOGGER.debug('%s contents:\n%s', pp_od_clk_voltage_file, pp_od_file_details) UnboundLocalError: local variable 'pp_od_file_details' referenced before assignment


(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff Error: Invalid power value read [None] Warning: Can not read parameter: mem_loading, disabling for this GPU: 0 Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0 Warning: Can not read parameter: power_cap, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0 Warning: Can not read parameter: fan_enable, disabling for this GPU: 0 Warning: Can not read parameter: fan_target, disabling for this GPU: 0 Warning: Can not read parameter: fan_speed, disabling for this GPU: 0 Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0 Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0 2 total GPUs, 1 rw, 1 r-only, 0 w-only

Error: System support issue for GPU [30:00.0] Card Number: 1 Vendor: NVIDIA Readable: True Writable: False Compute: False GPU UID: GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa GPU S/N: [N/A] Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] Card Model: GeForce GT 710 Display Card Model: GeForce GT 710 Card Index: 0 PCIe ID: 10:00.0 Link Speed: GEN[N/A] Link Width: [N/A] ################################################## Driver: 460.91.03 vBIOS Version: 80.28.A6.00.15 Compute Platform: None Compute Mode: Default GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card1/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ################################################## Current Power (W): None Power Cap (W): [N/A] Power Cap Range (W): ['[N/A]', '[N/A]'] Fan Target Speed (rpm): None Current Fan PWM (%): 40.000 ################################################## Current GPU Loading (%): [N/A] Current Memory Loading (%): [N/A] Current VRAM Usage (%): 0.200 Current VRAM Used (GB): 0.004 Total VRAM (GB): 1.955 Current Temps (C): {'temperature.gpu': 38.0, 'temperature.memory': None} Current Clk Frequencies (MHz): {'clocks.gr': None, 'clocks.mem': None, 'clocks.sm': None, 'clocks.video': None} Maximum Clk Frequencies (MHz): {'clocks.max.gr': None, 'clocks.max.mem': None, 'clocks.max.sm': None} Current SCLK P-State: [8, ''] Power Profile Mode: [N/A]

Card Number: 0 Vendor: AMD Readable: True Writable: True Compute: False GPU UID: None Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) Display Card Model: Cezanne PCIe ID: 30:00.0 Link Speed: 8.0 GT/s PCIe Link Width: 16 ################################################## Driver: amdgpu vBIOS Version: 113-CEZANNE-017 Compute Platform: None GPU Type: Supported HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 ################################################## Current Power (W): 15.000 Power Cap (W): None Power Cap Range (W): [None, None] Fan Enable: None Fan PWM Mode: [None, 'UNK'] Fan Target Speed (rpm): None Current Fan Speed (rpm): None Current Fan PWM (%): None Fan Speed Range (rpm): [None, None] Fan PWM Range (%): [None, None] ################################################## Current GPU Loading (%): 4 Current Memory Loading (%): None Current GTT Memory Usage (%): 8.170 Current GTT Memory Used (GB): 0.245 Total GTT Memory (GB): 3.000 Current VRAM Usage (%): 4.985 Current VRAM Used (GB): 0.100 Total VRAM (GB): 2.000 Current Temps (C): {'edge': 34.0} Critical Temps (C): {} Current Voltages (V): {'vddgfx': 1375, 'vddnb': 1099} Vddc Range: ['', ''] Current Clk Frequencies (MHz): {'sclk': 400.0} Current SCLK P-State: [1, '400Mhz'] SCLK Range: ['', ''] Current MCLK P-State: [0, '1600Mhz'] MCLK Range: ['', ''] Power Profile Mode: None Power DPM Force Performance Level: auto

Ricks-Lab commented 2 years ago

Great that you enjoyed some time off during the holiday. I was stuck at home under quarantine since returning from US.

I made some changes to logger during card discovery. It should be more robust now. When you run with --debug option it should leave a log file in your current directory. Is that not happening in your case?

Also, add added a check of the system type. It should be the first line displayed from gpu-ls. Later, I will only display in log file and gpu-chk.

Marjorie-R commented 2 years ago

Hi Rick,

On Sat, 2022-05-07 at 19:52 -0700, Rick wrote:

Great that you enjoyed some time off during the holiday. I was stuck at home under quarantine since returning from US. That's a bit of a pita. Are you out of quarantine now. Anyway, not doubt you're still pleased to be home. I made some changes to logger during card discovery. It should be more robust now. When you run with --debug option it should leave a log file in your current directory. Is that not happening in your case? Also, add added a check of the system type. It should be the first line displayed from gpu-ls. Later, I will only display in log file and gpu-chk. I've now sending the debug log files for gpu-ls 3.63 (AMD GPU and then NVIDIA GPU as the primary display (non-primary inactive), however these now cut off almost immediately. Not sure why.

I've also recovered the debug log files from gpu-ls 3.6.2 (AMD GPU and then NVIDIA GPU as the primary display (non-primary inactive). These are from the runs I did before I went away. These are more informative about the hardware/software I'm using.

--  Marjorie

--

--  Marjorie

Ricks-Lab commented 2 years ago

Seems like the log files did not get attached. I am most interested in the log file and output for the latest version that I pushed.

Marjorie-R commented 2 years ago

Hi Rick,

On Sun, 2022-05-08 at 06:54 -0700, Rick wrote:

Seems like the log files did not get attached. I am most interested in the log file and output for the latest version that I pushed.

Puzzled by this: the email (see below) I sent at 12:37 BST, Sunday today had 4 type application log attachments, hence resending.

If this still doesn't work I'll inline them in the email as text.

--  Marjorie

On Sat, 2022-05-07 at 19:52 -0700, Rick wrote:

Great that you enjoyed some time off during the holiday. I was stuck at home under quarantine since returning from US. That's a bit of a pita. Are you out of quarantine now. Anyway, not doubt you're still pleased to be home. I made some changes to logger during card discovery. It should be more robust now. When you run with --debug option it should leave a log file in your current directory. Is that not happening in your case? Also, add added a check of the system type. It should be the first line displayed from gpu-ls. Later, I will only display in log file and gpu-chk. I've now sending the debug log files for gpu-ls 3.63 (AMD GPU and then NVIDIA GPU as the primary display (non-primary inactive), however these now cut off almost immediately. Not sure why.

I've also recovered the debug log files from gpu-ls 3.6.2 (AMD GPU and then NVIDIA GPU as the primary display (non-primary inactive). These are from the runs I did before I went away. These are more informative about the hardware/software I'm using.

--  Marjorie

Ricks-Lab commented 2 years ago

Still not there. I suspect that the Github converter from email to issue update has trouble with complex update. Maybe it will be best to update on the issue page in Github.

Ricks-Lab commented 2 years ago

The more that I dig into this, the more I realize a rewrite of this section is necessary. I will drop a message here when I complete the work.

Ricks-Lab commented 2 years ago

I made good progress today. I have added a new command line option to gpu-ls. The --force_all option will attempt to read all relevant sensors regardless of the classification of the card or card level readability setting. If a sensor is found to be unreadable, it is added to a disable list. Maybe will tune based on results in your extreme case. It is ready for checkout whenever you have some time.

@csecht Would you like to check it out on your system just to make sure I did not break anything. Also, adding the power dpm state could be useful.

Marjorie-R commented 2 years ago

Hi Rick,

On Mon, 2022-05-09 at 02:23 -0700, Rick wrote:

I made good progress today. I have added a new command line option to gpu-ls. The --force_all option will attempt to read all relevant sensors regardless of the classification of the card or card level readability setting. If a sensor is found to be unreadable, it is added to a disable list. Maybe will tune based on results in your extreme case. It is ready for checkout whenever you have some time. @csecht Would you like to check it out on your system just to make sure I did not break anything. Also, adding the power dpm state could be useful. The following is what I get if I'm using my on-die AMD GPU as primary/display. I haven't tested with the NVIDIA GPU as primary.

Note: there is an issue with the fan sensors on my motherboard: these are not known to lm-sensors-detect, but clearly exist as I can see their outputs in BIOS.

If I run gpu-ls with the  --force_all option it fails early:

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls -- force_all Traceback (most recent call last): File "/home/marjorie/gpu-utils/./gpu-ls", line 157, in main() File "/home/marjorie/gpu-utils/./gpu-ls", line 96, in main if env.GUT_CONST.check_env() < 0: File "/home/marjorie/gpu-utils/GPUmodules/env.py", line 275, in check_env if os.path.islink(cmd_init): File "/usr/lib/python3.9/posixpath.py", line 167, in islink st = os.lstat(path) TypeError: lstat: path should be string, bytes or os.PathLike, not NoneType

Note there is no log file created with this option.

This is the same output as i get when I run gpu-ls with the --debug option. When I run with the --debug option I also get this in the log file:

DEBUG:gpu-utils:env.set_args:Install type: repository DEBUG:gpu-utils:env.set_args:Command line arguments: Namespace(about=False, short=False, table=False, pstates=False, ppm=False, clinfo=False, force_all=False, no_fan=False, debug=True) DEBUG:gpu-utils:env.set_args:Local TZ: BST DEBUG:gpu-utils:env.set_args:pciid path set to: /usr/share/misc/pci.ids DEBUG:gpu-utils:env.set_args:Icon path set to: /home/marjorie/gpu- utils/GPUmodules/../icons DEBUG:gpu-utils:gpu-ls.main:########## gpu-ls 3.6.3 DEBUG:gpu-utils:env.check_env:Using python: 3.9.2 DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.10.0-13-amd64

As the issues seems to be path related, here's my path: (rickslab-gpu-utils-env) @.***:~/gpu-utils$ echo $PATH /home/marjorie/gpu-utils/rickslab-gpu-utils- env/bin:/home/marjorie/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/game s:/usr/games

--  Marjorie

Ricks-Lab commented 2 years ago

Looks like the code I added to determine system type systemD vs systemV can not run on your system. Seems like os.path.islink throws an error on your system. I will catch it and set system type to unknown. Seems like init is not a valid command on your system. In debian, it is a link to systemd. It seemed like that was a valid way to determine system type. I will research on a better way.

Ricks-Lab commented 2 years ago

I completed some major work. Hopefully it addresses all of the issues that we have seen on your distro. I have tested on 4 of my systems with a wide range of GPU models. The --force_all option should now work and will include a list of sensors that it had to disable. I also added a new option for gpu-ls. The --raw will attempt to read every GPU driver file that it can find and output in a lightly formatted display. This should be useful in exploring read capability of a GPU. The init exception should now be ok, but still may not be identifying systemV type system. You can test this with gpu-chk.

Marjorie-R commented 2 years ago

Hi Rick,

On Tue, 2022-05-10 at 06:02 -0700, Rick wrote:

I completed some major work. Hopefully it addresses all of the issues that we have seen on your distro. I have tested on 4 of my systems with a wide range of GPU models. The --force_all option should now work and will include a list of sensors that it had to disable. I also added a new option for gpu-ls. The --raw will attempt to read every GPU driver file that it can find and output in a lightly formatted display. This should be useful in exploring read capability of a GPU. The init exception should now be ok, but still may not be identifying systemV type system. You can test this with gpu-chk. I've now tested gpu-ls with the AMD card used as primary graphics and display.

gpu-ls now runs with all options mentioned without failing, though it does still flag (and trap) two errors: DEBUG:gpu-utils:GPUmodule.set_gpu_list:Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] DEBUG:gpu-utils:env.process_message:Error: Invalid power value read [None]

I include all the outputs below + a listing of the log file when using --debug.

Note that the same or similar issues occur in the other utilities: 

I ran gps-mon and got this output: @.***:~/gpu-utils$ ./gpu-mon ┌─────────────┬────────────────────┬────────────────────┐ │Card #       │card1               │card0               │ ├─────────────┼────────────────────┼────────────────────┤ │Model        │ GeForce GT 710     │ Cezanne            │ │GPU Load %   │[N/A]               │2                   │ │Mem Load %   │[N/A]               │None                │ │VRAM Usage % │0.2                 │3.783               │ │GTT Usage %  │None                │13.953              │ │Power (W)    │None                │5.0                 │ │Power Cap (W)│[N/A]               │None                │ │Energy (kWh) │0.0                 │0.0                 │ │T (C)        │37.0                │31.0                │ │VddGFX (mV)  │nan                 │749                 │ │Fan Spd (%)  │40.0                │None                │ │Sclk (MHz) Traceback (most recent call last): File "/home/marjorie/gpu-utils/./gpu-mon", line 409, in main() File "/home/marjorie/gpu-utils/./gpu-mon", line 398, in main com_gpu_list.print_table() File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 2357, in print_table data_value_raw = gpu.get_params_value(table_item) File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 721, in get_params_value return int(self.prm['frequencies'][clock_name]) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

also gpu-chk: (rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-chk Using python 3.9.2 Python version OK. Using Linux Kernel: 5.10.0-13-amd64 OS kernel OK. Using system type: Unknown System type has not been verified. Using Linux distribution: Devuan GNU/Linux 4 (chimaera) Distro has not been verified. amdgpu/rocm version: UNKNOWN rickslab-gpu-utils can still be used. {'python': True, 'kernel': True, 'system': False, 'distribution': False, 'driver': True} Error in environment. Exiting...

and gpu-pac: (rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-pac OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

and gpu-plot: (rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-plot OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

gpu-plot waiting for initial data.Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner self.run() File "/usr/lib/python3.9/threading.py", line 892, in run self._target(*self._args, **self._kwargs) File "/home/marjorie/gpu-utils/./gpu-plot", line 731, in read_from_gpus gpu_plot_data = gpu.get_plot_data() File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 1685, in get_plot_data gpu_state_str = str(re.sub(PATTERNS['MHz'], '', str(self.get_params_value(table_item)))).strip() File "/home/marjorie/gpu-utils/GPUmodules/GPUmodule.py", line 721, in get_params_value return int(self.prm['frequencies'][clock_name]) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' ..................................................^CTraceback (most recent call last): File "/home/marjorie/gpu-utils/./gpu-plot", line 894, in main() File "/home/marjorie/gpu-utils/./gpu-plot", line 878, in main sleep(args.sleep/4.0) KeyboardInterrupt

If you think it helpful I can rerun with the NVIDIA graphics enabled instead, though it seems to me that we can close the issue as solved, albeit some of the fixes you have already identified may need to be applied to the other utilities. If it would help I'd be happy to validate these once they are fixed.

--  Marjorie

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

Card Number: 1 Vendor: NVIDIA Readable: True Writable: False Compute: False GPU UID: GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa GPU S/N: [N/A] Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] Card Model: GeForce GT 710 Display Card Model: GeForce GT 710 Card Index: 0 PCIe ID: 10:00.0 Link Speed: GEN[N/A] Link Width: [N/A] ################################################## Driver: 460.91.03 vBIOS Version: 80.28.A6.00.15 Compute Platform: None Compute Mode: Default GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card1/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ################################################## Current Power (W): None Power Cap (W): [N/A] Power Cap Range (W): ['[N/A]', '[N/A]'] Fan Target Speed (rpm): None Current Fan PWM (%): 40.000 ################################################## Current GPU Loading (%): [N/A] Current Memory Loading (%): [N/A] Current VRAM Usage (%): 0.200 Current VRAM Used (GB): 0.004 Total VRAM (GB): 1.955 Current Temps (C): {'temperature.gpu': 35.0, 'temperature.memory': None} Current Clk Frequencies (MHz): {'clocks.gr': None, 'clocks.mem': None, 'clocks.sm': None, 'clocks.video': None} Maximum Clk Frequencies (MHz): {'clocks.max.gr': None, 'clocks.max.mem': None, 'clocks.max.sm': None} Current SCLK P-State: [8, ''] Power Profile Mode: [N/A] Power DPM State: None

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False GPU UID: None Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) Display Card Model: Cezanne PCIe ID: 30:00.0 Link Speed: 8.0 GT/s PCIe Link Width: 16 ################################################## Driver: amdgpu vBIOS Version: 113-CEZANNE-017 Compute Platform: None GPU Type: APU HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 ################################################## Current Power (W): 9.000 Power Cap (W): None ################################################## Current GPU Loading (%): 2 Current Memory Loading (%): None Current GTT Memory Usage (%): 13.931 Current GTT Memory Used (GB): 0.418 Total GTT Memory (GB): 3.000 Current VRAM Usage (%): 3.399 Current VRAM Used (GB): 0.068 Total VRAM (GB): 2.000 Current Temps (C): {'edge': 31.0} Critical Temps (C): {} Current Voltages (V): {'vddgfx': 887, 'vddnb': 1099} Current Clk Frequencies (MHz): {'sclk': 400.0} Current SCLK P-State: [1, '400Mhz'] Current MCLK P-State: [0, '1600Mhz'] Power Profile Mode: None Power DPM State: performance Power DPM Force Performance Level: auto

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls --debug Devuan: Unverified OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

Card Number: 1 Vendor: NVIDIA Readable: True Writable: False Compute: False GPU UID: GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa GPU S/N: [N/A] Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] Card Model: GeForce GT 710 Display Card Model: GeForce GT 710 Card Index: 0 PCIe ID: 10:00.0 Link Speed: GEN[N/A] Link Width: [N/A] ################################################## Driver: 460.91.03 vBIOS Version: 80.28.A6.00.15 Compute Platform: None Compute Mode: Default GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card1/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ################################################## Current Power (W): None Power Cap (W): [N/A] Power Cap Range (W): ['[N/A]', '[N/A]'] Fan Target Speed (rpm): None Current Fan PWM (%): 40.000 ################################################## Current GPU Loading (%): [N/A] Current Memory Loading (%): [N/A] Current VRAM Usage (%): 0.200 Current VRAM Used (GB): 0.004 Total VRAM (GB): 1.955 Current Temps (C): {'temperature.gpu': 35.0, 'temperature.memory': None} Current Clk Frequencies (MHz): {'clocks.gr': None, 'clocks.mem': None, 'clocks.sm': None, 'clocks.video': None} Maximum Clk Frequencies (MHz): {'clocks.max.gr': None, 'clocks.max.mem': None, 'clocks.max.sm': None} Current SCLK P-State: [8, ''] Power Profile Mode: [N/A] Power DPM State: None

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False GPU UID: None Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) Display Card Model: Cezanne PCIe ID: 30:00.0 Link Speed: 8.0 GT/s PCIe Link Width: 16 ################################################## Driver: amdgpu vBIOS Version: 113-CEZANNE-017 Compute Platform: None GPU Type: APU HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 ################################################## Current Power (W): 5.000 Power Cap (W): None ################################################## Current GPU Loading (%): 2 Current Memory Loading (%): None Current GTT Memory Usage (%): 13.866 Current GTT Memory Used (GB): 0.416 Total GTT Memory (GB): 3.000 Current VRAM Usage (%): 3.533 Current VRAM Used (GB): 0.071 Total VRAM (GB): 2.000 Current Temps (C): {'edge': 30.0} Critical Temps (C): {} Current Voltages (V): {'vddgfx': 749, 'vddnb': 1099} Current Clk Frequencies (MHz): {'sclk': 400.0} Current SCLK P-State: [1, '400Mhz'] Current MCLK P-State: [0, '1600Mhz'] Power Profile Mode: None Power DPM State: performance Power DPM Force Performance Level: auto

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ cat ./debug_gpu- utils_20220510-195116.log DEBUG:gpu-utils:env.set_args:Install type: repository DEBUG:gpu-utils:env.set_args:Command line arguments: Namespace(about=False, short=False, table=False, pstates=False, ppm=False, clinfo=False, verbose=False, force_all=False, raw=False, no_fan=False, debug=True) DEBUG:gpu-utils:env.set_args:Local TZ: BST DEBUG:gpu-utils:env.set_args:pciid path set to: /usr/share/misc/pci.ids DEBUG:gpu-utils:env.set_args:Icon path set to: /home/marjorie/gpu- utils/GPUmodules/../icons DEBUG:gpu-utils:gpu-ls.main:########## gpu-ls 3.6.3 DEBUG:gpu-utils:env.check_env:Using python: 3.9.2 DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.10.0-13-amd64 DEBUG:gpu-utils:env.process_message:System Type: Unknown DEBUG:gpu-utils:env.check_env:Using Linux Distro: Devuan DEBUG:gpu-utils:env.check_env:Linux Distro Description: Devuan GNU/Linux 4 (chimaera) DEBUG:gpu-utils:env.check_env:Distro: Devuan, Devuan GNU/Linux 4 (chimaera) DEBUG:gpu-utils:env.check_env:lspci path: /usr/bin/lspci DEBUG:gpu-utils:env.check_env:clinfo path: /usr/bin/clinfo DEBUG:gpu-utils:env.check_env:Devuan package query tool: /usr/bin/dpkg DEBUG:gpu-utils:GPUmodule.set_gpu_list:OpenCL map: {None: {'prf_wg_multiple': None, 'max_wg_size': None, 'prf_wg_size': None, 'max_wi_sizes': None, 'max_wi_dim': None, 'max_mem_allocation': None, 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': None, 'device_name': None, 'opencl_version': None, 'driver_version': None, 'device_version': None}} DEBUG:gpu-utils:env.read_amdfeaturemask:Raw Featuremask string: [0xfffd7fff] DEBUG:gpu-utils:env.read_amdfeaturemask:AMD featuremask: 0xfffd7fff DEBUG:gpu-utils:GPUmodule.get_gpu_pci_list:Found GPU pci: 10:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) DEBUG:gpu-utils:GPUmodule.get_gpu_pci_list:Found GPU pci: 30:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) DEBUG:gpu-utils:GPUmodule.set_gpu_list:Found 2 GPUs DEBUG:gpu-utils:GPUmodule.add:Added GPU Item 240e5aba7d3b4252b284b2dfac975581 to GPU List DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 10:00.0 DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items: ['10:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)', '\tSubsystem: NVIDIA Corporation GK208B [GeForce GT 710]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidia', ''] DEBUG:gpu-utils:GPUmodule.set_gpu_list:gpu_name: [NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)] DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 device_dir: /sys/class/drm/card1/device DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card1/device DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 device_dir: /sys/class/drm/card0/device DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card1/device] contents: ['uevent', 'resource3_wc', 'resource5', 'i2c-10', 'resource3', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'iommu_group', 'consumer:pci:0000:10:00.1', 'local_cpus', 'i2c-8', 'power', 'class', 'reset', 'numa_node', 'resource', 'rescan', 'max_link_width', 'msi_bus', 'device', 'boot_vga', 'current_link_width', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'resource1_wc', 'i2c-9', 'consistent_dma_mask_bits', 'resource0', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'iommu', 'enable', 'link', 'modalias', 'subsystem_vendor', 'drm'] DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: [] DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict: {'pcie_id': '10:00.0', 'model': 'NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidia', 'card_path': '/sys/class/drm/card1/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0', 'gpu_type': <type.Supported: 3>, 'hwmon_path': '', 'readable': True, 'writable': False, 'compute': False, 'compute_platform': None} DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: True, writable: False, type: Supported DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card1/device] DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [['0x10de', '0x128b', '0x10de', '0x128b']], type: [<class 'list'>] DEBUG:gpu-utils:GPUmodule.add:Added GPU Item c555d38d2cd44c819e6f1e3bd02a715c to GPU List DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 30:00.0 DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items: ['30:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9)', '\tSubsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 1636', '\tKernel driver in use: amdgpu', '\tKernel modules: amdgpu', ''] DEBUG:gpu-utils:GPUmodule.set_gpu_list:gpu_name: [Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9)] DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 device_dir: /sys/class/drm/card1/device DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 device_dir: /sys/class/drm/card0/device DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card0/device DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card0/device] contents: ['hdcp_srm', 'uevent', 'resource5', 'resource0_wc', 'gpu_metrics', 'power_dpm_force_performance_level', 'serial_number', 'i2c-1', 'product_name', 'graphics', 'broken_parity_status', 'power_dpm_state', 'mem_info_vram_vendor', 'gpu_busy_percent', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'pp_table', 'iommu_group', 'local_cpus', 'firmware_node', 'backlight', 'power', 'pp_dpm_mclk', 'class', 'reset', 'numa_node', 'resource', 'rescan', 'max_link_width', 'resource2_wc', 'pcie_replay_count', 'pp_dpm_pcie', 'msi_bus', 'device', 'mem_info_vram_used', 'boot_vga', 'pp_num_states', 'current_link_width', 'pp_cur_state', 'driver', 'max_link_speed', 'pp_sclk_od', 'i2c-2', 'resource4', 'pp_dpm_dcefclk', 'mem_info_vis_vram_used', 'local_cpulist', 'pp_dpm_socclk', 'mem_info_vis_vram_total', 'i2c-0', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'resource2', 'pp_dpm_fclk', 'current_link_speed', 'pp_power_profile_mode', 'mem_info_gtt_used', 'consistent_dma_mask_bits', 'mem_info_vram_total', 'vbios_version', 'resource0', 'pp_od_clk_voltage', 'config', 'ari_enabled', 'msi_irqs', 'pp_force_state', 'fw_version', 'remove', 'consumer:pci:0000:30:00.1', 'thermal_throttling_logging', 'iommu', 'enable', 'link', 'product_number', 'pp_mclk_od', 'hwmon', 'mem_info_gtt_total', 'modalias', 'subsystem_vendor', 'drm', 'pp_dpm_sclk'] DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: ['/sys/class/drm/card0/device/hwmon/hwmon1'] DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW dir [/sys/class/drm/card0/device/hwmon/hwmon1] contents: ['uevent', 'in1_label', 'freq1_label', 'in0_input', 'power', 'temp1_label', 'device', 'in1_input', 'power1_average', 'freq1_input', 'subsystem', 'temp1_input', 'in0_label', 'name'] DEBUG:gpu-utils:GPUmodule.set_gpu_list:Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] DEBUG:gpu- utils:GPUmodule.set_gpu_list:/sys/class/drm/card0/device/pp_od_clk_volt age contents: /sys/class/drm/card0/device/pp_od_clk_voltage not readable DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict: {'pcie_id': '30:00.0', 'model': 'Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9)', 'vendor': <vendor.AMD: 3>, 'driver': 'amdgpu', 'card_path': '/sys/class/drm/card0/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0', 'gpu_type': <type.APU: 5>, 'hwmon_path': '/sys/class/drm/card0/device/hwmon/hwmon1', 'readable': True, 'writable': False, 'compute': False, 'compute_platform': None} DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: True, writable: False, type: APU DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [['0x1002', '0x1638', '0x1002', '0x1636']], type: [<class 'list'>] DEBUG:gpu-utils:GPUmodule.wattman_status:AMD featuremask: 0xfffd7fff DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=power.limit -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [power.limit], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=power.min_limit -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [power.min_limit], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=power.max_limit -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [power.max_limit], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=memory.total -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['2002', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [2002] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [memory.total], result: [2002] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=vbios_version -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['80.28.A6.00.15', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [80.28.A6.00.15] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [vbios_version], result: [80.28.A6.00.15] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=driver_version -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['460.91.03', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [460.91.03] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [driver_version], result: [460.91.03] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=compute_mode -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['Default', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [Default] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [compute_mode], result: [Default] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=name -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['GeForce GT 710', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [GeForce GT 710] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [name], result: [GeForce GT 710] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=serial -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [serial], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=index -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['0', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [0] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [index], result: [0] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=gpu_uuid -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [GPU- abfa1778-17bb-0861-2cd3-b6c1aa704dfa] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [gpu_uuid], result: [GPU-abfa1778-17bb-0861-2cd3- b6c1aa704dfa] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=power.draw -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [power.draw], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=temperature.gpu -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['35', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [35] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [temperature.gpu], result: [35] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=temperature.memory -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['N/A', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [N/A] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [temperature.memory], result: [N/A] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.gr -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.gr], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.sm -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.sm], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.mem -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.mem], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.video -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.video], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.max.gr -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.max.gr], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.max.sm -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.max.sm], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=clocks.max.mem -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [clocks.max.mem], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=utilization.gpu -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [utilization.gpu], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=utilization.memory -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [utilization.memory], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=memory.used -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['4', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [4] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [memory.used], result: [4] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=fan.speed -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['40', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [40] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [fan.speed], result: [40] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=gom.current -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [gom.current], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=pcie.link.width.current -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [pcie.link.width.current], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=pcie.link.gen.current -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['[N/A]', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [pcie.link.gen.current], result: [[N/A]] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV command: /usr/bin/nvidia-smi -i 10:00.0 --query-gpu=pstate -- format=csv,noheader,nounits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV raw query response: [['P8', '']] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_nv:NV query result: [P8] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query (each-call) query item [pstate], result: [P8] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_nv:NV query result: {'power.limit': '[N/A]', 'power.min_limit': '[N/A]', 'power.max_limit': '[N/A]', 'memory.total': '2002', 'vbios_version': '80.28.A6.00.15', 'driver_version': '460.91.03', 'compute_mode': 'Default', 'name': 'GeForce GT 710', 'serial': '[N/A]', 'index': '0', 'gpu_uuid': 'GPU- abfa1778-17bb-0861-2cd3-b6c1aa704dfa', 'power.draw': '[N/A]', 'temperature.gpu': '35', 'temperature.memory': 'N/A', 'clocks.gr': '[N/A]', 'clocks.sm': '[N/A]', 'clocks.mem': '[N/A]', 'clocks.video': '[N/A]', 'clocks.max.gr': '[N/A]', 'clocks.max.sm': '[N/A]', 'clocks.max.mem': '[N/A]', 'utilization.gpu': '[N/A]', 'utilization.memory': '[N/A]', 'memory.used': '4', 'fan.speed': '40', 'gom.current': '[N/A]', 'pcie.link.width.current': '[N/A]', 'pcie.link.gen.current': '[N/A]', 'pstate': 'P8'} DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [None], type: [<class 'NoneType'>] DEBUG:gpu-utils:env.process_message:Error: Invalid power value read [None] DEBUG:gpu-utils:env.process_message:Warning: Can not read parameter: power, disabling for this GPU: 1 DEBUG:gpu-utils:env.process_message:Warning: Can not read parameter: energy, disabling for this GPU: 1 DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: unique_id DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:HW file does not exist: /sys/class/drm/card0/device/unique_id DEBUG:gpu-utils:env.process_message:Warning: Can not read parameter: unique_id, disabling for this GPU: 0 DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: vbios DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [113- CEZANNE-017] for parameter: vbios DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [113- CEZANNE-017], type: [<class 'str'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: loading DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [2] for parameter: loading DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [2], type: [<class 'int'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: mem_loading DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:HW file does not exist: /sys/class/drm/card0/device/mem_busy_percent DEBUG:gpu-utils:env.process_message:Warning: Can not read parameter: mem_loading, disabling for this GPU: 0 DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: link_spd DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [8.0 GT/s PCIe] for parameter: link_spd DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [8.0 GT/s PCIe], type: [<class 'str'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: link_wth DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [16] for parameter: link_wth DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [16], type: [<class 'str'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: sclk_ps DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [['0: 200Mhz', '1: 400Mhz ', '2: 1900Mhz']] for parameter: sclk_ps DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [['0: 200Mhz', '1: 400Mhz ', '2: 1900Mhz']], type: [<class 'list'>] DEBUG:gpu-utils:GPUmodule.set_params_value:Mask: [0,1,2], ps: [1, 400Mhz] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: mclk_ps DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [['0: 1600Mhz ']] for parameter: mclk_ps DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [['0: 1600Mhz ']], type: [<class 'list'>] DEBUG:gpu-utils:GPUmodule.set_params_value:Mask: [0], ps: [0, 1600Mhz] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: ppm DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: ppm DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: power_dpm_force DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [auto] for parameter: power_dpm_force DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [auto], type: [<class 'str'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: power_dpm_state DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [performance] for parameter: power_dpm_state DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [performance], type: [<class 'str'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: mem_vram_total DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [2.0] for parameter: mem_vram_total DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [2.0], type: [<class 'float'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: mem_gtt_total DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [3.0] for parameter: mem_gtt_total DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [3.0], type: [<class 'float'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: mem_vram_used DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [0.0706634521484375] for parameter: mem_vram_used DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [0.0706634521484375], type: [<class 'float'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: mem_gtt_used DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [0.4159736633300781] for parameter: mem_gtt_used DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [0.4159736633300781], type: [<class 'float'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: power_cap_range DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:HW file does not exist: /sys/class/drm/card0/device/hwmon/hwmon1/power1_cap_min DEBUG:gpu-utils:env.process_message:Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0 DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: temp_crits DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [{}] for parameter: temp_crits DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [{}], type: [<class 'dict'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: power DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [5.0] for parameter: power DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [5.0], type: [<class 'float'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: power_cap DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:HW file does not exist: /sys/class/drm/card0/device/hwmon/hwmon1/power1_cap DEBUG:gpu-utils:env.process_message:Warning: Can not read parameter: power_cap, disabling for this GPU: 0 DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: temperatures DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [{'edge': 30.0}] for parameter: temperatures DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [{'edge': 30.0}], type: [<class 'dict'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: voltages DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [{'vddgfx': 749, 'vddnb': 1099}] for parameter: voltages DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [{'vddgfx': 749, 'vddnb': 1099}], type: [<class 'dict'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: frequencies DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/class/drm/card0/device] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Valid data [{'sclk': 400.0}] for parameter: frequencies DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [{'sclk': 400.0}], type: [<class 'dict'>] DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: fan_speed_range DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: fan_speed_range DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: fan_pwm_range DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: fan_pwm_range DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: fan_enable DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: fan_enable DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: fan_target DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: fan_target DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: fan_speed DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: fan_speed DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: pwm_mode DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: pwm_mode DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Processing parameter: fan_pwm DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_set_amd:Read data [None], Invalid or disabled parameter: fan_pwm

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls -- force_all OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

Error: System support issue for GPU [30:00.0] Card Number: 1 Vendor: NVIDIA Readable: True Writable: False Compute: False GPU UID: GPU-abfa1778-17bb-0861-2cd3-b6c1aa704dfa GPU S/N: [N/A] Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] Card Model: GeForce GT 710 Display Card Model: GeForce GT 710 Card Index: 0 PCIe ID: 10:00.0 Link Speed: GEN[N/A] Link Width: [N/A] ################################################## Driver: 460.91.03 vBIOS Version: 80.28.A6.00.15 Compute Platform: None Compute Mode: Default GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card1/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ################################################## Power Cap (W): [N/A] Power Cap Range (W): ['[N/A]', '[N/A]'] Fan Target Speed (rpm): None Current Fan PWM (%): 40.000 ################################################## Current GPU Loading (%): [N/A] Current Memory Loading (%): [N/A] Current VRAM Usage (%): 0.200 Current VRAM Used (GB): 0.004 Total VRAM (GB): 1.955 Current Temps (C): {'temperature.gpu': 35.0, 'temperature.memory': None} Current Clk Frequencies (MHz): {'clocks.gr': None, 'clocks.mem': None, 'clocks.sm': None, 'clocks.video': None} Maximum Clk Frequencies (MHz): {'clocks.max.gr': None, 'clocks.max.mem': None, 'clocks.max.sm': None} Current SCLK P-State: [8, ''] Power Profile Mode: [N/A] Power DPM State: None ################################################## Disabled Parameters: power, energy

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) Display Card Model: Cezanne PCIe ID: 30:00.0 Link Speed: 8.0 GT/s PCIe Link Width: 16 ################################################## Driver: amdgpu vBIOS Version: 113-CEZANNE-017 Compute Platform: None GPU Type: APU HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 ################################################## Current Power (W): 8.000 ################################################## Current GPU Loading (%): 1 Current GTT Memory Usage (%): 13.931 Current GTT Memory Used (GB): 0.418 Total GTT Memory (GB): 3.000 Current VRAM Usage (%): 3.399 Current VRAM Used (GB): 0.068 Total VRAM (GB): 2.000 Current Temps (C): {'edge': 30.0} Critical Temps (C): {} Current Voltages (V): {'vddgfx': 887, 'vddnb': 1099} Vddc Range: ['', ''] Current Clk Frequencies (MHz): {'sclk': 400.0} Current SCLK P-State: [1, '400Mhz'] SCLK Range: ['', ''] Current MCLK P-State: [0, '1600Mhz'] MCLK Range: ['', ''] Power Profile Mode: None Power DPM State: performance Power DPM Force Performance Level: auto ################################################## Disabled Parameters: unique_id, mem_loading, power_cap_range, power_cap fan_speed_range, fan_pwm_range, fan_enable, fan_target fan_speed, pwm_mode, fan_pwm, pp_od_clk_voltage

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls --raw OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

Card Number: 1 Vendor: NVIDIA Readable: True Writable: False Compute: False Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] PCIe ID: 10:00.0 GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card1/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0

DEVICE

uevent::DRIVER=nvidia PCI_CLASS=30000 PCI_ID=10DE:128B PCI_SUBSYS_ID=10DE:128B PCI_SLOT_NAME=0000:10:00.0 MODALIAS=pci:v000010DEd0000128Bsv000010DEsd0000128Bbc03sc00i00 resource3_wc::UNKNOWN resource5::UNKNOWN resource3::UNKNOWN broken_parity_status::0 subsystem_device::0x128b rom::UNKNOWN dma_mask_bits::40 vendor:Vendor:0x10de resource1::UNKNOWN local_cpus::00000fff class::0x030000 reset::UNKNOWN numa_node::-1 resource::0x00000000fb000000 0x00000000fbffffff 0x0000000000040200 0x00000000d8000000 0x00000000dfffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000e0000000 0x00000000e1ffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000000f000 0x000000000000f07f 0x0000000000040101 0x00000000fc000000 0x00000000fc07ffff 0x0000000000046200 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 rescan::UNKNOWN max_link_width::8 msi_bus::1 device::0x128b boot_vga::0 current_link_width::8 max_link_speed::5.0 GT/s PCIe local_cpulist::0-11 driver_override::(null) d3cold_allowed::1 irq::77 revision::0xa1 current_link_speed::2.5 GT/s PCIe resource1_wc::UNKNOWN consistent_dma_mask_bits::40 resource0::UNKNOWN config::BINARY ari_enabled::0 remove::UNKNOWN enable::1 modalias::pci:v000010DEd0000128Bsv000010DEsd0000128Bbc03sc00i00 subsystem_vendor::0x10de

HWMON

##################################################

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne PCIe ID: 30:00.0 GPU Type: APU HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0

DEVICE

hdcp_srm::BINARY uevent::DRIVER=amdgpu PCI_CLASS=30000 PCI_ID=1002:1638 PCI_SUBSYS_ID=1002:1636 PCI_SLOT_NAME=0000:30:00.0 MODALIAS=pci:v00001002d00001638sv00001002sd00001636bc03sc00i00 resource5::UNKNOWN resource0_wc::UNKNOWN gpu_metrics::BINARY power_dpm_force_performance_level::auto serial_number:GPU S/N: product_name:: broken_parity_status::0 power_dpm_state:Power DPM State:performance mem_info_vram_vendor::unknown gpu_busy_percent::2 subsystem_device::0x1636 rom::UNKNOWN dma_mask_bits::44 vendor:Vendor:0x1002 pp_table::UNKNOWN local_cpus::00000fff pp_dpm_mclk::0: 1600Mhz class::0x030000 reset::UNKNOWN numa_node::-1 resource::0x00000000c0000000 0x00000000cfffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000d0000000 0x00000000d01fffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000000d000 0x000000000000d0ff 0x0000000000040101 0x00000000fc500000 0x00000000fc57ffff 0x0000000000040200 0x00000000000c0000 0x00000000000dffff 0x0000000000000212 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 rescan::UNKNOWN max_link_width::16 resource2_wc::UNKNOWN pcie_replay_count::0 pp_dpm_pcie::UNKNOWN msi_bus::1 device::0x1638 mem_info_vram_used::73773056 boot_vga::1 pp_num_states::states: 1 0 default current_link_width::16 pp_cur_state::-22 max_link_speed::16.0 GT/s PCIe pp_sclk_od::0 resource4::UNKNOWN pp_dpm_dcefclk::0: 400Mhz 1: 464Mhz 2: 514Mhz 3: 576Mhz 4: 626Mhz 5: 685Mhz 6: 757Mhz 7: 847Mhz mem_info_vis_vram_used::73773056 local_cpulist::0-11 pp_dpm_socclk::0: 400Mhz 1: 445Mhz 2: 520Mhz 3: 600Mhz 4: 678Mhz 5: 780Mhz 6: 866Mhz 7: 975Mhz mem_info_vis_vram_total::2147483648 driver_override::(null) d3cold_allowed::1 irq::37 revision::0xc9 resource2::UNKNOWN pp_dpm_fclk::0: 1600Mhz current_link_speed::8.0 GT/s PCIe pp_power_profile_mode::1 3D_FULL_SCREEN 3 VIDEO 4 VR 5 COMPUTE 6 CUSTOM mem_info_gtt_used::446648320 consistent_dma_mask_bits::44 mem_info_vram_total::2147483648 vbios_version::113-CEZANNE-017 resource0::UNKNOWN pp_od_clk_voltage::UNKNOWN config::BINARY ari_enabled::0 pp_force_state:: remove::UNKNOWN thermal_throttling_logging::0000:30:00.0: thermal throttling logging enabled, with interval 60 seconds enable::1 product_number:: pp_mclk_od::0 mem_info_gtt_total::3221225472 modalias::pci:v00001002d00001638sv00001002sd00001636bc03sc00i00 subsystem_vendor::0x1002 pp_dpm_sclk::0: 200Mhz 1: 400Mhz * 2: 1900Mhz

HWMON

hdcp_srm::BINARY uevent::DRIVER=amdgpu PCI_CLASS=30000 PCI_ID=1002:1638 PCI_SUBSYS_ID=1002:1636 PCI_SLOT_NAME=0000:30:00.0 MODALIAS=pci:v00001002d00001638sv00001002sd00001636bc03sc00i00 resource5::UNKNOWN resource0_wc::UNKNOWN gpu_metrics::BINARY power_dpm_force_performance_level::auto serial_number:GPU S/N: product_name:: broken_parity_status::0 power_dpm_state:Power DPM State:performance mem_info_vram_vendor::unknown gpu_busy_percent::2 subsystem_device::0x1636 rom::UNKNOWN dma_mask_bits::44 vendor:Vendor:0x1002 pp_table::UNKNOWN local_cpus::00000fff pp_dpm_mclk::0: 1600Mhz class::0x030000 reset::UNKNOWN numa_node::-1 resource::0x00000000c0000000 0x00000000cfffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000d0000000 0x00000000d01fffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000000d000 0x000000000000d0ff 0x0000000000040101 0x00000000fc500000 0x00000000fc57ffff 0x0000000000040200 0x00000000000c0000 0x00000000000dffff 0x0000000000000212 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 rescan::UNKNOWN max_link_width::16 resource2_wc::UNKNOWN pcie_replay_count::0 pp_dpm_pcie::UNKNOWN msi_bus::1 device::0x1638 mem_info_vram_used::73773056 boot_vga::1 pp_num_states::states: 1 0 default current_link_width::16 pp_cur_state::-22 max_link_speed::16.0 GT/s PCIe pp_sclk_od::0 resource4::UNKNOWN pp_dpm_dcefclk::0: 400Mhz 1: 464Mhz 2: 514Mhz 3: 576Mhz 4: 626Mhz 5: 685Mhz 6: 757Mhz 7: 847Mhz mem_info_vis_vram_used::73773056 local_cpulist::0-11 pp_dpm_socclk::0: 400Mhz 1: 445Mhz 2: 520Mhz 3: 600Mhz 4: 678Mhz 5: 780Mhz 6: 866Mhz 7: 975Mhz mem_info_vis_vram_total::2147483648 driver_override::(null) d3cold_allowed::1 irq::37 revision::0xc9 resource2::UNKNOWN pp_dpm_fclk::0: 1600Mhz current_link_speed::8.0 GT/s PCIe pp_power_profile_mode::1 3D_FULL_SCREEN 3 VIDEO 4 VR 5 COMPUTE 6 CUSTOM mem_info_gtt_used::446648320 consistent_dma_mask_bits::44 mem_info_vram_total::2147483648 vbios_version::113-CEZANNE-017 resource0::UNKNOWN pp_od_clk_voltage::UNKNOWN config::BINARY ari_enabled::0 pp_force_state:: remove::UNKNOWN thermal_throttling_logging::0000:30:00.0: thermal throttling logging enabled, with interval 60 seconds enable::1 product_number:: pp_mclk_od::0 mem_info_gtt_total::3221225472 modalias::pci:v00001002d00001638sv00001002sd00001636bc03sc00i00 subsystem_vendor::0x1002 pp_dpm_sclk::0: 200Mhz 1: 400Mhz * 2: 1900Mhz ##################################################

Ricks-Lab commented 2 years ago

I have rewritten significant sections of code to give me more flexibility to manage GPU lists and better manage capability in the various applications. I have tested on my systems with no problems. It should exit more gracefully if you try to run one of the utilities without any GPUs that are compatible.

I only have one APU (first one developed by AMD) so I am curious if your newer generation chip supports more driver capability. Can you run gpu-ls --raw with the latest in the repository and post the output here? If it does have more capability, then I can update classifications to allow for different levels of APU capability. Thanks!

Marjorie-R commented 2 years ago

Hi Rick,

On Wed, 2022-05-11 at 05:50 -0700, Rick wrote:

I have rewritten significant sections of code to give me more flexibility to manage GPU lists and better manage capability in the various applications. I have tested on my systems with no problems. It should exit more gracefully if you try to run one of the utilities without any GPUs that are compatible. I'm still getting the same errors with the other utilities, gpu-mon and gpu-plot.

I  think I failed to report on gpu-pac. Output is:

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-pac OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

Compatible GPUs: None are writable, exiting...

I only have one APU (first one developed by AMD) so I am curious if your newer generation chip supports more driver capability. Can you run gpu-ls --raw with the latest in the repository and post the output here? If it does have more capability, then I can update classifications to allow for different levels of APU capability. Thanks! Results below for ./gpu-ls --raw. This is with me using the AMD APU.

--  Marjorie

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls --raw OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: NVIDIA: 1, AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 2 total GPUs, 0 rw, 2 r-only, 0 w-only

Card Number: 1 Vendor: NVIDIA Readable: True Writable: False Compute: False Device ID: {'device': '0x128b', 'subsystem_device': '0x128b', 'subsystem_vendor': '0x10de', 'vendor': '0x10de'} Decoded Device ID: GK208B [GeForce GT 710] PCIe ID: 10:00.0 GPU Type: Supported HWmon: None Card Path: /sys/class/drm/card1/device System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0

DEVICE

uevent::

DRIVER=nvidia PCI_CLASS=30000 PCI_ID=10DE:128B PCI_SUBSYS_ID=10DE:128B PCI_SLOT_NAME=0000:10:00.0 MODALIAS=pci:v000010DEd0000128Bsv000010DEsd0000128Bbc03sc00i00

resource3_wc::

PermissionError

resource5::

PermissionError

resource3::

PermissionError

broken_parity_status::

0

subsystem_device::

0x128b

rom::

PermissionError

dma_mask_bits::

40

vendor:Vendor:

0x10de

resource1::

PermissionError

local_cpus::

00000fff

class::

0x030000

reset::

PermissionError

numa_node::

-1

resource::

0x00000000fb000000 0x00000000fbffffff 0x0000000000040200 0x00000000d8000000 0x00000000dfffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000e0000000 0x00000000e1ffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000000f000 0x000000000000f07f 0x0000000000040101 0x00000000fc000000 0x00000000fc07ffff 0x0000000000046200 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000

rescan::

PermissionError

max_link_width::

8

msi_bus::

1

device::

0x128b

boot_vga::

0

current_link_width::

8

max_link_speed::

5.0 GT/s PCIe

local_cpulist::

0-11

driver_override::

(null)

d3cold_allowed::

1

irq::

77

revision::

0xa1

current_link_speed::

2.5 GT/s PCIe

resource1_wc::

PermissionError

consistent_dma_mask_bits::

40

resource0::

PermissionError

config::

BINARY

ari_enabled::

0

remove::

PermissionError

enable::

1

modalias::

pci:v000010DEd0000128Bsv000010DEsd0000128Bbc03sc00i00

subsystem_vendor::

0x10de

HWMON

##################################################

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne PCIe ID: 30:00.0 GPU Type: APU HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0

DEVICE

hdcp_srm::

BINARY

uevent::

DRIVER=amdgpu PCI_CLASS=30000 PCI_ID=1002:1638 PCI_SUBSYS_ID=1002:1636 PCI_SLOT_NAME=0000:30:00.0 MODALIAS=pci:v00001002d00001638sv00001002sd00001636bc03sc00i00

resource5::

PermissionError

resource0_wc::

PermissionError

gpu_metrics::

BINARY

power_dpm_force_performance_level::

auto

serial_number:GPU S/N:

product_name::

broken_parity_status::

0

power_dpm_state:Power DPM State:

performance

mem_info_vram_vendor::

unknown

gpu_busy_percent::

2

subsystem_device::

0x1636

rom::

PermissionError

dma_mask_bits::

44

vendor:Vendor:

0x1002

pp_table::

OSError

local_cpus::

00000fff

pp_dpm_mclk::

0: 1600Mhz *

class::

0x030000

reset::

PermissionError

numa_node::

-1

resource::

0x00000000c0000000 0x00000000cfffffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000d0000000 0x00000000d01fffff 0x000000000014220c 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000000d000 0x000000000000d0ff 0x0000000000040101 0x00000000fc500000 0x00000000fc57ffff 0x0000000000040200 0x00000000000c0000 0x00000000000dffff 0x0000000000000212 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000

rescan::

PermissionError

max_link_width::

16

resource2_wc::

PermissionError

pcie_replay_count::

0

pp_dpm_pcie::

OSError

msi_bus::

1

device::

0x1638

mem_info_vram_used::

165462016

boot_vga::

1

pp_num_states::

states: 1 0 default

current_link_width::

16

pp_cur_state::

-22

max_link_speed::

16.0 GT/s PCIe

pp_sclk_od::

0

resource4::

PermissionError

pp_dpm_dcefclk::

0: 400Mhz 1: 464Mhz 2: 514Mhz 3: 576Mhz 4: 626Mhz 5: 685Mhz 6: 757Mhz 7: 847Mhz *

mem_info_vis_vram_used::

165462016

local_cpulist::

0-11

pp_dpm_socclk::

0: 400Mhz * 1: 445Mhz 2: 520Mhz 3: 600Mhz 4: 678Mhz 5: 780Mhz 6: 866Mhz 7: 975Mhz

mem_info_vis_vram_total::

2147483648

driver_override::

(null)

d3cold_allowed::

1

irq::

37

revision::

0xc9

resource2::

PermissionError

pp_dpm_fclk::

0: 1600Mhz *

current_link_speed::

8.0 GT/s PCIe

pp_power_profile_mode::

1 3D_FULL_SCREEN 3 VIDEO 4 VR 5 COMPUTE 6 CUSTOM

mem_info_gtt_used::

476033024

consistent_dma_mask_bits::

44

mem_info_vram_total::

2147483648

vbios_version::

113-CEZANNE-017

resource0::

PermissionError

pp_od_clk_voltage::

OSError

config::

BINARY

ari_enabled::

0

pp_force_state::

remove::

PermissionError

thermal_throttling_logging::

0000:30:00.0: thermal throttling logging enabled, with interval 60 seconds

enable::

1

product_number::

pp_mclk_od::

0

mem_info_gtt_total::

3221225472

modalias::

pci:v00001002d00001638sv00001002sd00001636bc03sc00i00

subsystem_vendor::

0x1002

pp_dpm_sclk::

0: 200Mhz 1: 400Mhz * 2: 1900Mhz

HWMON

uevent::

in1_label::

vddnb

freq1_label::

sclk

in0_input::

893

temp1_label::

edge

in1_input::

1093

power1_average::

6000000

freq1_input::

400000000

temp1_input::

32000

in0_label::

vddgfx

name::

amdgpu ##################################################

Ricks-Lab commented 2 years ago

I have done a major rewrite of the plotting related code. It should now work with all AMD APU and most AMD Legacy GPUs. Can you give it a try? It is the latest in the repository.

Marjorie-R commented 2 years ago

Hi Rick,

On Thu, 2022-05-12 at 22:55 -0700, Rick wrote:

I have done a major rewrite of the plotting related code. It should now work with all AMD APU and most AMD Legacy GPUs. Can you give it a try? It is the latest in the repository. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***> I've now run this and it seems to be working OK.

I include the outputs from gpu-ls and gpu-plot and a screenshot from gpu-plot.

Hopefully everything will get through to you this time.

Note that I've just moved my system to a new case and did not bother to re-install the NVIDIA card.

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-ls OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: AMD: 1 AMD: Wattman features enabled: 0xfffd7fff 1 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0 Vendor: AMD Readable: True Writable: False Compute: False Device ID: {'device': '0x1638', 'subsystem_device': '0x1636', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'} Decoded Device ID: Cezanne Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) Display Card Model: Cezanne PCIe ID: 30:00.0 Link Speed: 8.0 GT/s PCIe Link Width: 16 ################################################## Driver: amdgpu vBIOS Version: 113-CEZANNE-017 Compute Platform: None GPU Type: APU HWmon: /sys/class/drm/card0/device/hwmon/hwmon1 Card Path: /sys/class/drm/card0/device System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:30:00.0 ################################################## Current Power (W): 17.000 ################################################## Current Memory Loading (%): None Current GTT Memory Usage (%): 7.690 Current VRAM Usage (%): 4.014 Current Temps (C): {'edge': 28.0} Critical Temps (C): {} Current Voltages (V): {'vddgfx': 1368, 'vddnb': 1099} Vddc Range: ['', ''] Current Clk Frequencies (MHz): {'sclk': 400.0} Current SCLK P-State: [1, '400Mhz'] SCLK Range: ['', ''] Current MCLK P-State: [0, '1600Mhz'] MCLK Range: ['', ''] Power DPM State: performance Power DPM Force Performance Level: auto

(rickslab-gpu-utils-env) @.***:~/gpu-utils$ ./gpu-plot OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi] Error: system support issue for 30:00.0: [[Errno 88] Socket operation on non-socket] Detected GPUs: AMD: 1 AMD: Wattman features enabled: 0xfffd7fff Compatible GPUs: 1 total GPUs, 0 rw, 1 r-only, 0 w-only

gpu-plot waiting for initial data.......

--  Marjorie

Ricks-Lab commented 2 years ago

Thanks! Seems like all is well. Seems like the attachments by responding by email do not make it into an issue update. It would be cool to see, so post the plot image directly to this issue thread if you get a chance.

Marjorie-R commented 2 years ago

Hi Rick,

On Fri, 2022-05-13 at 18:31 -0700, Rick wrote:

Thanks! Seems like all is well. Seems like the attachments by responding by email do not make it into an issue update. It would be cool to see, so post the plot image directly to this issue thread if you get a chance. I'm not familiar with using GitHub!!

I've posted a reply (with images) here: https://github.com/Ricks-Lab/gpu-utils/discussions/101. I trust that's OK.

Note: gpu-pac on my system is still broken (traceback included).

--  Marjorie

Ricks-Lab commented 2 years ago

I would like to add you to the credits:

@Marjorie-R - Testing, Debug, Verification of AMD-APU Capability

Let me know of any concerns.

Marjorie-R commented 2 years ago

Hi Rick,

On Wed, 2022-05-25 at 00:09 -0700, Rick wrote:

I would like to add you to the credits: @Marjorie-R - Testing, Debug, Verification of AMD-APU Capability Let me know of any concerns. No problems. Glad to help. My name in lights, at last :-).

--  Marjorie

Marjorie-R commented 1 year ago

On Mon, 2022-05-09 at 06:34 -0700, Rick wrote:

Looks like the code I added to determine system type systemD vs systemV can not run on your system. Seems like os.path.islink throws an error on your system. I will catch it and set system type to unknown. If it's any help runfiles for .deb often include code like this to check for systemd:

if [ -d /run/systemd/system ]; then systemctl --system daemon-reload >/dev/null || true if [ -n "$2" ]; then _dh_action=restart else _dh_action=start fi deb-systemd-invoke $_dh_action 'chrony.service' >/dev/null || true fi

i.e. in a systemd system then the directory /run/systemd/system should exist. On my PC with Devuan, where I use sysvinit instead, it is absent.

--  Marjorie

Ricks-Lab commented 1 year ago

On Mon, 2022-05-09 at 06:34 -0700, Rick wrote: Looks like the code I added to determine system type systemD vs systemV can not run on your system. Seems like os.path.islink throws an error on your system. I will catch it and set system type to unknown. If it's any help runfiles for .deb often include code like this to check for systemd: if [ -d /run/systemd/system ]; then systemctl --system daemon-reload >/dev/null || true if [ -n "$2" ]; then _dh_action=restart else _dh_action=start fi deb-systemd-invoke $_dh_action 'chrony.service' >/dev/null || true fi i.e. in a systemd system then the directory /run/systemd/system should exist. On my PC with Devuan, where I use sysvinit instead, it is absent. --  Marjorie

I have implemented a check that I derived from stack overflow. Can you try ./gpu-ls --verbose. System type should be one of the first items displayed. Not sure if I asked you to check this in the past.

Ricks-Lab commented 10 months ago

This should be resolved.