Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
136 stars 23 forks source link

Exploring Extended Support for Non-AMD GPUs #83

Closed Ricks-Lab closed 4 years ago

Ricks-Lab commented 4 years ago

The utilities already leverage all PCIE information identify all installed GPUs. I would like to extend this by examining additional sensor files in the card and hwmon directories. Looking for users with other than AMD GPUs to share details here. Easiest was is to run the latest version on the extended branch with the --debug option and share the debug log file.

KeithMyers commented 4 years ago

Here you go Rick. keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-ls --debug

Ubuntu: Validated
Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 08:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0

Card Number: 1
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0a:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0

Card Number: 2
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0b:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0

keith@Serenity:~/Downloads/amdgpu-utils-extended$

DEBUG:gpu-utils:env.set_args:Command line arguments:
  Namespace(about=False, clinfo=False, debug=True, no_fan=False, ppm=False, pstates=False, short=False, table=False)
DEBUG:gpu-utils:env.set_args:Local TZ: PDT
DEBUG:gpu-utils:amdgpu-ls.main:########## amdgpu-ls v3.3.0
DEBUG:gpu-utils:env.check_env:Using python: 3.8.2
DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.4.0-37-generic
DEBUG:gpu-utils:env.check_env:Using Linux Distro: Ubuntu
DEBUG:gpu-utils:env.check_env:Linux Distro Description: Ubuntu 20.04 LTS
DEBUG:gpu-utils:env.check_env:Ubuntu package query tool: /usr/bin/dpkg
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [GeForce RTX 2080]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.2 CUDA]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [440.64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.2]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:NV ocl_pcie_id [08:00.0]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [46]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 1024 64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [1024]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [32]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [2092515328]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:cl_index: {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2092515328', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [GeForce RTX 2080]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.2 CUDA]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [440.64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.2]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:NV ocl_pcie_id [0a:00.0]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [46]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 1024 64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [1024]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [32]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [2091696128]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:cl_index: {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2091696128', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [GeForce RTX 2080]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.2 CUDA]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [440.64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.2]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:NV ocl_pcie_id [0b:00.0]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [46]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 1024 64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [1024]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [32]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [2092515328]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:OpenCL map: {'08:00.0': {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2092515328', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}, '0a:00.0': {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2091696128', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}, '0b:00.0': {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2092515328', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Found 3 GPUs
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item 0a4eaef50da94c43aa9680928bfbe96f to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 08:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. TU104 [GeForce RTX 2080 Rev. A]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
device_dir: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
device_dir: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card0/device] contents:
['uevent', 'resource3_wc', 'resource5', 'resource3', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'i2c-17', 'iommu_group', 'local_cpus', 'firmware_node', 'power', 'class', 'reset', 'i2c-15', 'numa_node', 'resource', 'rescan', 'max_link_width', 'msi_bus', 'device', 'i2c-13', 'boot_vga', 'aer_dev_nonfatal', 'current_link_width', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'i2c-18', 'resource1_wc', 'aer_dev_correctable', 'consistent_dma_mask_bits', 'resource0', 'i2c-16', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'iommu', 'aer_dev_fatal', 'enable', 'link', 'i2c-14', 'modalias', 'i2c-12', 'subsystem_vendor', 'drm']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: []
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '08:00.0', 'model': 'NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', 'model_short': 'UNKNOWN', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidiafb, nouveau, nvidia_drm, nvidia', 'card_path': '/sys/class/drm/card0/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': True, 'compute_platform': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor:read_gpu_sensor set to [/sys/class/drm/card0/device]
DEBUG:gpu-utils:GPUmodule.read_pciid_model:Logger active in module
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item 66652196e2d44a2aa079683b0605d8a3 to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0a:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. TU104 [GeForce RTX 2080 Rev. A]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
device_dir: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
device_dir: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card1/device] contents:
['uevent', 'resource3_wc', 'resource5', 'i2c-10', 'resource3', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'iommu_group', 'local_cpus', 'firmware_node', 'i2c-8', 'power', 'class', 'reset', 'numa_node', 'resource', 'rescan', 'i2c-6', 'max_link_width', 'msi_bus', 'device', 'boot_vga', 'aer_dev_nonfatal', 'current_link_width', 'i2c-11', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'resource1_wc', 'i2c-9', 'aer_dev_correctable', 'consistent_dma_mask_bits', 'resource0', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'i2c-7', 'iommu', 'aer_dev_fatal', 'enable', 'link', 'i2c-5', 'modalias', 'subsystem_vendor', 'drm']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: []
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0a:00.0', 'model': 'NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', 'model_short': 'UNKNOWN', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidiafb, nouveau, nvidia_drm, nvidia', 'card_path': '/sys/class/drm/card1/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': True, 'compute_platform': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor:read_gpu_sensor set to [/sys/class/drm/card1/device]
DEBUG:gpu-utils:GPUmodule.read_pciid_model:Logger active in module
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item aec7577d8dd847c78a9b8755c9b22321 to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0b:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0b:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. TU104 [GeForce RTX 2080 Rev. A]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
device_dir: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
device_dir: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card2/device] contents:
['uevent', 'resource3_wc', 'resource5', 'i2c-20', 'resource3', 'i2c-19', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'iommu_group', 'local_cpus', 'firmware_node', 'power', 'i2c-25', 'class', 'reset', 'numa_node', 'resource', 'rescan', 'max_link_width', 'msi_bus', 'i2c-23', 'device', 'boot_vga', 'aer_dev_nonfatal', 'i2c-21', 'current_link_width', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'resource1_wc', 'aer_dev_correctable', 'consistent_dma_mask_bits', 'resource0', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'iommu', 'aer_dev_fatal', 'i2c-24', 'enable', 'link', 'i2c-22', 'modalias', 'subsystem_vendor', 'drm']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: []
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0b:00.0', 'model': 'NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', 'model_short': 'UNKNOWN', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidiafb, nouveau, nvidia_drm, nvidia', 'card_path': '/sys/class/drm/card2/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': True, 'compute_platform': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor:read_gpu_sensor set to [/sys/class/drm/card2/device]
DEBUG:gpu-utils:GPUmodule.read_pciid_model:Logger active in module
Ricks-Lab commented 4 years ago

Thanks for the details! This confirms that NV only uses the generic PCIe sensors in the card path. Looks like no GPU specific sensors. In that case, perhaps the nvidia.smi is the only choice for reading card details. I was hoping to first just read the details necessary for monitor and plot utilities and use only those parameters in ls. Here is a list of those sensors for AMD:

SensorSet.Monitor: {'HWMON':  ['power', 'power_cap', 'temperatures', 'voltages',
                               'frequencies', 'fan_pwm'],
                    'DEVICE': ['loading', 'mem_loading', 'mem_gtt_used', 'mem_vram_used',
                               'sclk_ps', 'mclk_ps', 'ppm']},

Can you help provide the nvidia command line and sample output for this set?

Ricks-Lab commented 4 years ago

Here is the code we worked out for benchMT for power reading:

try:
    nsmi_items = subprocess.check_output(
        '{} -i {} --query-gpu=power.draw --format=csv,noheader,nounits'.format(
         MB_CONST.cmd_nvidia_smi, self.pcie_id), shell=True).decode().split('\n')
    power_reading = float(nsmi_items[0].strip())
except (subprocess.CalledProcessError, OSError) as except_err:
    power_reading = None
KeithMyers commented 4 years ago

I can't seem to make any sensible output from that command stack. Nothing but syntax errors.

keith@Serenity:~$ nvidia-smi nsmi_items = subprocess.check_output(
bash: syntax error near unexpected token `('
keith@Serenity:~$         '{} -i {} --query-gpu=power.draw --format=csv,noheader,nounits'.format(
bash: syntax error near unexpected token `newline'
keith@Serenity:~$          MB_CONST.cmd_nvidia_smi, self.pcie_id), shell=True).decode().split('\n')
bash: syntax error near unexpected token `)'
keith@Serenity:~$     power_reading = float(nsmi_items[0].strip())
bash: syntax error near unexpected token `('
keith@Serenity:~$ except (subprocess.CalledProcessError, OSError) as except_err:
bash: syntax error near unexpected token `subprocess.CalledProcessError,'
keith@Serenity:~$     power_reading = None

If I break it down to just: nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits I get something: 133.91 189.84 104.34

KeithMyers commented 4 years ago

I can't seem to find any way to get voltages out of nvidia-smi. Also can't get fan_pwm, just fan.speed.

nvidia-smi --query-gpu=power.limit --format=csv,noheader,nounits 200.00 200.00 200.00 nvidia-smi --query-gpu=power.max_limit --format=csv,noheader,nounits 292.00 292.00 292.00 nvidia-smi --query-gpu=power.default_limit --format=csv,noheader,nounits 225.00 225.00 225.00 nvidia-smi --query-gpu=power.min_limit --format=csv,noheader,nounits 105.00 105.00 105.00 nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits 45 43 37 nvidia-smi --query-gpu=temperature.memory --format=csv,noheader,nounits N/A N/A N/A nvidia-smi --query-gpu=clocks.current.graphics --format=csv,noheader,nounits 1515 1965 1995 nvidia-smi --query-gpu=clocks.sm --format=csv,noheader,nounits 2010 1965 1995 nvidia-smi --query-gpu=clocks.mem --format=csv,noheader,nounits 7199 7199 7199 nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits 88 1 98 nvidia-smi --query-gpu=utilization.memory --format=csv,noheader,nounits 6 0 3 nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits 100 100 100 nvidia-smi --query-gpu=pcie.link.width.max --format=csv,noheader,nounits 16 16 16 nvidia-smi --query-gpu=pcie.link.width.current --format=csv,noheader,nounits 4 8 8

Ricks-Lab commented 4 years ago

@KeithMyers Thanks for the details. Do you know if several query items can be provided in a single call? Something like:

nvidia-smi --query-gpu=pcie.link.width.current,pcie.link.width.max,power.draw  --format=csv,noheader,nounits
Ricks-Lab commented 4 years ago

The latest on extended branch includes one read statement which is printed in raw form. Let me know when you have a chance to test. Just execute amdgpu-ls.

KeithMyers commented 4 years ago

@KeithMyers Thanks for the details. Do you know if several query items can be provided in a single call? Something like:

nvidia-smi --query-gpu=pcie.link.width.current,pcie.link.width.max,power.draw  --format=csv,noheader,nounits

Should be able. Supposed to separate arguments with just a comma like your example.

nvidia-smi --query-gpu=pcie.link.width.current,pcie.link.width.max,power.draw  --format=csv,noheader,nounits
4, 16, 99.70
8, 16, 192.69
8, 16, 195.36
Ricks-Lab commented 4 years ago

I have a single read implemented. Let me know the output. Should just be a single string before the ls output.

KeithMyers commented 4 years ago

I'm not sure what you want executed. I thought you had updated the repo. I see a commit that is two hours old. I just downloaded the repo again and don't see any change from the last test. Same output:

Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
/bin/sh: 1: None: not found
/bin/sh: 1: None: not found
/bin/sh: 1: None: not found
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 0 r-only, 0 w-only
Ricks-Lab commented 4 years ago

I had a typo in command name. Fixed that and printing out full command for examination.

KeithMyers commented 4 years ago

Ok, here is something different after the typo fix.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current --format=csv,noheader,nounits
NV query result: [['98.49, 42, N/A, 2010, 2010, 7199, 100, 4, 100, 4', '']]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current --format=csv,noheader,nounits
NV query result: [['157.72, 42, N/A, 1980, 1980, 7199, 98, 49, 100, 8', '']]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current --format=csv,noheader,nounits
NV query result: [['196.12, 47, N/A, 1935, 1935, 7199, 96, 24, 100, 8', '']]
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 3 r-only, 0 w-only
Ricks-Lab commented 4 years ago

Looks good! Do you know if the current p-state is available? It would be easiest if I can fit NV cards into some of the reports I have already developed.

KeithMyers commented 4 years ago

nvidia-smi --query-gpu=pstate --format=csv,noheader,nounits P2 P2 P2

Ricks-Lab commented 4 years ago

Are memory pstates also available?

KeithMyers commented 4 years ago

Would the help output from nvidia-smi --help-query-gpu be helpful? Basically covers all parameters available from the query-gpu function.

nvidia-smi --help-query-gpu
List of valid properties to query for the switch "--query-gpu=":

"timestamp"
The timestamp of where the query was made in format "YYYY/MM/DD HH:MM:SS.msec".

"driver_version"
The version of the installed NVIDIA display driver. This is an alphanumeric string.

"count"
The number of NVIDIA GPUs in the system.

"name" or "gpu_name"
The official product name of the GPU. This is an alphanumeric string. For all products.

"serial" or "gpu_serial"
This number matches the serial number physically printed on each board. It is a globally unique immutable alphanumeric value.

"uuid" or "gpu_uuid"
This value is the globally unique immutable alphanumeric identifier of the GPU. It does not correspond to any physical label on the board.

"pci.bus_id" or "gpu_bus_id"
PCI bus id as "domain:bus:device.function", in hex.

"pci.domain"
PCI domain number, in hex.

"pci.bus"
PCI bus number, in hex.

"pci.device"
PCI device number, in hex.

"pci.device_id"
PCI vendor device id, in hex

"pci.sub_device_id"
PCI Sub System id, in hex

"pcie.link.gen.current"
The current PCI-E link generation. These may be reduced when the GPU is not in use.

"pcie.link.gen.max"
The maximum PCI-E link generation possible with this GPU and system configuration. For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation.

"pcie.link.width.current"
The current PCI-E link width. These may be reduced when the GPU is not in use.

"pcie.link.width.max"
The maximum PCI-E link width possible with this GPU and system configuration. For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation.

"index"
Zero based index of the GPU. Can change at each boot.

"display_mode"
A flag that indicates whether a physical display (e.g. monitor) is currently connected to any of the GPU's connectors. "Enabled" indicates an attached display. "Disabled" indicates otherwise.

"display_active"
A flag that indicates whether a display is initialized on the GPU's (e.g. memory is allocated on the device for display). Display can be active even when no monitor is physically attached. "Enabled" indicates an active display. "Disabled" indicates otherwise.

"persistence_mode"
A flag that indicates whether persistence mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, such as X11 or nvidia-smi, exist. This minimizes the driver load latency associated with running dependent apps, such as CUDA programs. Linux only.

"accounting.mode"
A flag that indicates whether accounting mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When accounting is enabled statistics are calculated for each compute process running on the GPU.Statistics can be queried during the lifetime or after termination of the process.The execution time of process is reported as 0 while the process is in running state and updated to actualexecution time after the process has terminated. See --help-query-accounted-apps for more info.

"accounting.buffer_size"
The size of the circular buffer that holds list of processes that can be queried for accounting stats. This is the maximum number of processes that accounting information will be stored for before information about oldest processes will get overwritten by information about new processes.

Section about driver_model properties
On Windows, the TCC and WDDM driver models are supported. The driver model can be changed with the (-dm) or (-fdm) flags. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of "N/A". Only for selected products. Please see feature matrix in NVML documentation.

"driver_model.current"
The driver model currently in use. Always "N/A" on Linux.

"driver_model.pending"
The driver model that will be used on the next reboot. Always "N/A" on Linux.

"vbios_version"
The BIOS of the GPU board.

Section about inforom properties
Version numbers for each object in the GPU board's inforom storage. The inforom is a small, persistent store of configuration and state data for the GPU. All inforom version fields are numerical. It can be useful to know these version numbers because some GPU features are only available with inforoms of a certain version or higher.

"inforom.img" or "inforom.image"
Global version of the infoROM image. Image version just like VBIOS version uniquely describes the exact version of the infoROM flashed on the board in contrast to infoROM object version which is only an indicator of supported features.

"inforom.oem"
Version for the OEM configuration data.

"inforom.ecc"
Version for the ECC recording data.

"inforom.pwr" or "inforom.power"
Version for the power management data.

Section about gom properties
GOM allows to reduce power usage and optimize GPU throughput by disabling GPU features. Each GOM is designed to meet specific user needs.
In "All On" mode everything is enabled and running at full speed.
The "Compute" mode is designed for running only compute tasks. Graphics operations are not allowed.
The "Low Double Precision" mode is designed for running graphics applications that don't require high bandwidth double precision.
GOM can be changed with the (--gom) flag.

"gom.current" or "gpu_operation_mode.current"
The GOM currently in use.

"gom.pending" or "gpu_operation_mode.pending"
The GOM that will be used on the next reboot.

"fan.speed"
The fan speed value is the percent of maximum speed that the device's fan is currently intended to run at. It ranges from 0 to 100 %. Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, this output will not match the actual fan speed. Many parts do not report fan speeds because they rely on cooling via fans in the surrounding enclosure.

"pstate"
The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).

Section about clocks_throttle_reasons properties
Retrieves information about factors that are reducing the frequency of clocks. If all throttle reasons are returned as "Not Active" it means that clocks are running as high as possible.

"clocks_throttle_reasons.supported"
Bitmask of supported clock throttle reasons. See nvml.h for more details.

"clocks_throttle_reasons.active"
Bitmask of active clock throttle reasons. See nvml.h for more details.

"clocks_throttle_reasons.gpu_idle"
Nothing is running on the GPU and the clocks are dropping to Idle state. This limiter may be removed in a later release.

"clocks_throttle_reasons.applications_clocks_setting"
GPU clocks are limited by applications clocks setting. E.g. can be changed by nvidia-smi --applications-clocks=

"clocks_throttle_reasons.sw_power_cap"
SW Power Scaling algorithm is reducing the clocks below requested clocks because the GPU is consuming too much power. E.g. SW power cap limit can be changed with nvidia-smi --power-limit=

"clocks_throttle_reasons.hw_slowdown"
HW Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of:
 HW Thermal Slowdown: temperature being too high
 HW Power Brake Slowdown: External Power Brake Assertion is triggered (e.g. by the system power supply)
 * Power draw is too high and Fast Trigger protection is reducing the clocks
 * May be also reported during PState or clock change
 * This behavior may be removed in a later release

"clocks_throttle_reasons.hw_thermal_slowdown"
HW Thermal Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of temperature being too high

"clocks_throttle_reasons.hw_power_brake_slowdown"
HW Power Brake Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of External Power Brake Assertion being triggered (e.g. by the system power supply)

"clocks_throttle_reasons.sw_thermal_slowdown"
SW Thermal capping algorithm is reducing clocks below requested clocks because GPU temperature is higher than Max Operating Temp.

"clocks_throttle_reasons.sync_boost"
Sync Boost This GPU has been added to a Sync boost group with nvidia-smi or DCGM in
 * order to maximize performance per watt. All GPUs in the sync boost group
 * will boost to the minimum possible clocks across the entire group. Look at
 * the throttle reasons for other GPUs in the system to see why those GPUs are
 * holding this one at lower clocks.

Section about memory properties
On-board memory information. Reported total memory is affected by ECC state. If ECC is enabled the total available memory is decreased by several percent, due to the requisite parity bits. The driver may also reserve a small amount of memory for internal use, even without active work on the GPU.

"memory.total"
Total installed GPU memory.

"memory.used"
Total memory allocated by active contexts.

"memory.free"
Total free memory.

"compute_mode"
The compute mode flag indicates whether individual or multiple compute applications may run on the GPU.
"Default" means multiple contexts are allowed per device.
"Exclusive_Process" means only one context is allowed per device, usable from multiple threads at a time.
"Prohibited" means no contexts are allowed per device (no compute apps).

Section about utilization properties
Utilization rates report how busy each GPU is over time, and can be used to determine how much an application is using the GPUs in the system.

"utilization.gpu"
Percent of time over the past sample period during which one or more kernels was executing on the GPU.
The sample period may be between 1 second and 1/6 second depending on the product.

"utilization.memory"
Percent of time over the past sample period during which global (device) memory was being read or written.
The sample period may be between 1 second and 1/6 second depending on the product.

Section about encoder.stats properties
Encoder stats report number of encoder sessions, average FPS and average latency in ms for given GPUs in the system.

"encoder.stats.sessionCount"
Number of encoder sessions running on the GPU.

"encoder.stats.averageFps"
Average FPS of all sessions running on the GPU.

"encoder.stats.averageLatency"
Average latency in microseconds of all sessions running on the GPU.

Section about ecc.mode properties
A flag that indicates whether ECC support is enabled. May be either "Enabled" or "Disabled". Changes to ECC mode require a reboot. Requires Inforom ECC object version 1.0 or higher.

"ecc.mode.current"
The ECC mode that the GPU is currently operating under.

"ecc.mode.pending"
The ECC mode that the GPU will operate under after the next reboot.

Section about ecc.errors properties
NVIDIA GPUs can provide error counts for various types of ECC errors. Some ECC errors are either single or double bit, where single bit errors are corrected and double bit errors are uncorrectable. Texture memory errors may be correctable via resend or uncorrectable if the resend fails. These errors are available across two timescales (volatile and aggregate). Single bit ECC errors are automatically corrected by the HW and do not result in data corruption. Double bit errors are detected but not corrected. Please see the ECC documents on the web for information on compute application behavior when double bit errors occur. Volatile error counters track the number of errors detected since the last driver load. Aggregate error counts persist indefinitely and thus act as a lifetime counter.

"ecc.errors.corrected.volatile.device_memory"
Errors detected in global device memory.

"ecc.errors.corrected.volatile.register_file"
Errors detected in register file memory.

"ecc.errors.corrected.volatile.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.corrected.volatile.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.corrected.volatile.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.corrected.volatile.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

"ecc.errors.corrected.aggregate.device_memory"
Errors detected in global device memory.

"ecc.errors.corrected.aggregate.register_file"
Errors detected in register file memory.

"ecc.errors.corrected.aggregate.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.corrected.aggregate.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.corrected.aggregate.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.corrected.aggregate.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

"ecc.errors.uncorrected.volatile.device_memory"
Errors detected in global device memory.

"ecc.errors.uncorrected.volatile.register_file"
Errors detected in register file memory.

"ecc.errors.uncorrected.volatile.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.uncorrected.volatile.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.uncorrected.volatile.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.uncorrected.volatile.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

"ecc.errors.uncorrected.aggregate.device_memory"
Errors detected in global device memory.

"ecc.errors.uncorrected.aggregate.register_file"
Errors detected in register file memory.

"ecc.errors.uncorrected.aggregate.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.uncorrected.aggregate.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.uncorrected.aggregate.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.uncorrected.aggregate.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

Section about retired_pages properties
NVIDIA GPUs can retire pages of GPU device memory when they become unreliable. This can happen when multiple single bit ECC errors occur for the same page, or on a double bit ECC error. When a page is retired, the NVIDIA driver will hide it such that no driver, or application memory allocations can access it.

"retired_pages.single_bit_ecc.count" or "retired_pages.sbe"
The number of GPU device memory pages that have been retired due to multiple single bit ECC errors.

"retired_pages.double_bit.count" or "retired_pages.dbe"
The number of GPU device memory pages that have been retired due to a double bit ECC error.

"retired_pages.pending"
Checks if any GPU device memory pages are pending retirement on the next reboot. Pages that are pending retirement can still be allocated, and may cause further reliability issues.

"temperature.gpu"
 Core GPU temperature. in degrees C.

"temperature.memory"
 HBM memory temperature. in degrees C.

"power.management"
A flag that indicates whether power management is enabled. Either "Supported" or "[Not Supported]". Requires Inforom PWR object version 3.0 or higher or Kepler device.

"power.draw"
The last measured power draw for the entire board, in watts. Only available if power management is supported. This reading is accurate to within +/- 5 watts.

"power.limit"
The software power limit in watts. Set by software like nvidia-smi. On Kepler devices Power Limit can be adjusted using [-pl | --power-limit=] switches.

"enforced.power.limit"
The power management algorithm's power ceiling, in watts. Total board power draw is manipulated by the power management algorithm such that it stays under this value. This value is the minimum of various power limiters.

"power.default_limit"
The default power management algorithm's power ceiling, in watts. Power Limit will be set back to Default Power Limit after driver unload.

"power.min_limit"
The minimum value in watts that power limit can be set to.

"power.max_limit"
The maximum value in watts that power limit can be set to.

"clocks.current.graphics" or "clocks.gr"
Current frequency of graphics (shader) clock.

"clocks.current.sm" or "clocks.sm"
Current frequency of SM (Streaming Multiprocessor) clock.

"clocks.current.memory" or "clocks.mem"
Current frequency of memory clock.

"clocks.current.video" or "clocks.video"
Current frequency of video encoder/decoder clock.

Section about clocks.applications properties
User specified frequency at which applications will be running at. Can be changed with [-ac | --applications-clocks] switches.

"clocks.applications.graphics" or "clocks.applications.gr"
User specified frequency of graphics (shader) clock.

"clocks.applications.memory" or "clocks.applications.mem"
User specified frequency of memory clock.

Section about clocks.default_applications properties
Default frequency at which applications will be running at. Application clocks can be changed with [-ac | --applications-clocks] switches. Application clocks can be set to default using [-rac | --reset-applications-clocks] switches.

"clocks.default_applications.graphics" or "clocks.default_applications.gr"
Default frequency of applications graphics (shader) clock.

"clocks.default_applications.memory" or "clocks.default_applications.mem"
Default frequency of applications memory clock.

Section about clocks.max properties
Maximum frequency at which parts of the GPU are design to run.

"clocks.max.graphics" or "clocks.max.gr"
Maximum frequency of graphics (shader) clock.

"clocks.max.sm" or "clocks.max.sm"
Maximum frequency of SM (Streaming Multiprocessor) clock.

"clocks.max.memory" or "clocks.max.mem"
Maximum frequency of memory clock.
KeithMyers commented 4 years ago

Are memory pstates also available?

Apparently not.

Ricks-Lab commented 4 years ago

I have implemented much of the core code to support NV, but it currently will just display a dictionary of raw data. Let me know if it works. It was a lot of code to write with no ability to test it out...

KeithMyers commented 4 years ago

Something regressed in the https://github.com/Ricks-Lab/amdgpu-utils/commit/e3cec7bd6387a0e89574cd77809cb31eddd62cc1 commit.

 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 98, in main
    gpu_list.set_gpu_list(clinfo_flag=True)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1675, in set_gpu_list
    self[gpu_uuid].read_gpu_sensor_set_nv()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1126, in read_gpu_sensor_set_nv
    raise TypeError('Invalid SensorSet value: [{}]'.format(data_type))
TypeError: Invalid SensorSet value: [set.All]
Ricks-Lab commented 4 years ago

It was an error in my error checking! I just pushed a change.

KeithMyers commented 4 years ago

Still don't think this is what you expected. No values for parameters you were able to retrieve previously.

keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-ls

OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72b78c440>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04e40>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04640>]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04640>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04700>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04600>]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None
KeithMyers commented 4 years ago

The errors in amdgpu-monitor might point out something. keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-monitor

OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba25680>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba25700>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c240>]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c240>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c6c0>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c480>]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Traceback (most recent call last):
  File "./amdgpu-monitor", line 384, in <module>
    main()
  File "./amdgpu-monitor", line 366, in main
    com_gpu_list.read_gpu_sensor_set(data_type=Gpu.GpuItem.SensorSet.Monitor)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1918, in read_gpu_sensor_set
    gpu.read_gpu_sensor_set(data_type)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1115, in read_gpu_sensor_set
    return self.read_gpu_sensor_set_nv(data_type)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1126, in read_gpu_sensor_set_nv
    raise TypeError('Invalid SensorSet value: [{}]'.format(data_type))
TypeError: Invalid SensorSet value: [set.Monitor]
Ricks-Lab commented 4 years ago

I still have lots of work to do before any of the utilities are functional. I am currently just making sure I have a way to read sensors into a dictionary. I had an error in the zip statement meant to accomplish that. Please check out the latest. The only place that you will see the parameters is in the display of the dictionary for now.

KeithMyers commented 4 years ago

OK, here is the dictionary. ./amdgpu-ls

OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 159.92, 52, N/A, 1995, 1995, 7199, 87, 6, 100, 4, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 173.23, 46, N/A, 1965, 1965, 7199, 75, 51, 100, 8, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 103.02, 41, N/A, 1980, 1980, 7199, 98, 4, 100, 8, P2', 'power.min_limit': ''}]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 157.27, 53, N/A, 1995, 1995, 7199, 87, 6, 100, 4, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 172.96, 47, N/A, 1965, 1965, 7199, 75, 51, 100, 8, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 103.02, 41, N/A, 1980, 1980, 7199, 98, 4, 100, 8, P2', 'power.min_limit': ''}]
3 total GPUs, 0 rw, 3 r-only, 0 w-only
Ricks-Lab commented 4 years ago

I have added some more debug. The output is different than expected, so need to workout how to get into a dictionary.

KeithMyers commented 4 years ago

Do you want just the normal dictionary output from amdgpu-ls? Or do you want the full debug output?

Ricks-Lab commented 4 years ago

Just amdgpu-ls normal output is needed.

KeithMyers commented 4 years ago
 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 98.11, 43, N/A, 1995, 1995, 7199, 100, 4, 100, 4, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', '98.11,', '43,', 'N/A,', '1995,', '1995,', '7199,', '100,', '4,', '100,', '4,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', 'temperature.memory': '98.11,', 'clocks.current.graphics': '43,', 'clocks.sm': 'N/A,', 'clocks.mem': '1995,', 'utilization.gpu': '1995,', 'utilization.memory': '7199,', 'fan.speed': '100,', 'pcie.link.width.current': '4,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 150.82, 45, N/A, 1965, 1965, 7199, 86, 55, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7979,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', '150.82,', '45,', 'N/A,', '1965,', '1965,', '7199,', '86,', '55,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7979,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', 'temperature.memory': '150.82,', 'clocks.current.graphics': '45,', 'clocks.sm': 'N/A,', 'clocks.mem': '1965,', 'utilization.gpu': '1965,', 'utilization.memory': '7199,', 'fan.speed': '86,', 'pcie.link.width.current': '55,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 200.21, 49, N/A, 1950, 1950, 7199, 97, 19, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', '200.21,', '49,', 'N/A,', '1950,', '1950,', '7199,', '97,', '19,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', 'temperature.memory': '200.21,', 'clocks.current.graphics': '49,', 'clocks.sm': 'N/A,', 'clocks.mem': '1950,', 'utilization.gpu': '1950,', 'utilization.memory': '7199,', 'fan.speed': '97,', 'pcie.link.width.current': '19,', 'pstate': '100,'}]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 98.44, 43, N/A, 1995, 1995, 7199, 100, 4, 100, 4, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', '98.44,', '43,', 'N/A,', '1995,', '1995,', '7199,', '100,', '4,', '100,', '4,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', 'temperature.memory': '98.44,', 'clocks.current.graphics': '43,', 'clocks.sm': 'N/A,', 'clocks.mem': '1995,', 'utilization.gpu': '1995,', 'utilization.memory': '7199,', 'fan.speed': '100,', 'pcie.link.width.current': '4,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 176.98, 45, N/A, 1965, 1965, 7199, 86, 55, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7979,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', '176.98,', '45,', 'N/A,', '1965,', '1965,', '7199,', '86,', '55,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7979,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', 'temperature.memory': '176.98,', 'clocks.current.graphics': '45,', 'clocks.sm': 'N/A,', 'clocks.mem': '1965,', 'utilization.gpu': '1965,', 'utilization.memory': '7199,', 'fan.speed': '86,', 'pcie.link.width.current': '55,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 200.97, 49, N/A, 1935, 1935, 7199, 97, 19, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', '200.97,', '49,', 'N/A,', '1935,', '1935,', '7199,', '97,', '19,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', 'temperature.memory': '200.97,', 'clocks.current.graphics': '49,', 'clocks.sm': 'N/A,', 'clocks.mem': '1935,', 'utilization.gpu': '1935,', 'utilization.memory': '7199,', 'fan.speed': '97,', 'pcie.link.width.current': '19,', 'pstate': '100,'}]
3 total GPUs, 0 rw, 3 r-only, 0 w-only
KeithMyers commented 4 years ago
Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None
Ricks-Lab commented 4 years ago

Hi Keith, did the dictionary info get displayed? Nevermind, I see it now.

Ricks-Lab commented 4 years ago

Looks like I did not split the results correctly. I have pushed another version with more debug print statements. Can you run again? Thanks!

KeithMyers commented 4 years ago
 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 96.73, 41, N/A, 1995, 1995, 7199, 99, 4, 100, 4, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-089608fe-cba5-4711-bf68-085fd0711d8c', ' 96.73', ' 41', ' N/A', ' 1995', ' 1995', ' 7199', ' 99', ' 4', ' 100', ' 4', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', '96.73', '41', 'N/A', '1995', '1995', '7199', '99', '4', '100', '4', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', 'power.draw': '96.73', 'temperature.gpu': '41', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1995', 'clocks.sm': '1995', 'clocks.mem': '7199', 'utilization.gpu': '99', 'utilization.memory': '4', 'fan.speed': '100', 'pcie.link.width.current': '4', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 163.16, 42, N/A, 1965, 1965, 7199, 88, 56, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7979', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-22b2c6ac-2d49-4863-197c-9c469071178a', ' 163.16', ' 42', ' N/A', ' 1965', ' 1965', ' 7199', ' 88', ' 56', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7979', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', '163.16', '42', 'N/A', '1965', '1965', '7199', '88', '56', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7979', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', 'power.draw': '163.16', 'temperature.gpu': '42', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1965', 'clocks.sm': '1965', 'clocks.mem': '7199', 'utilization.gpu': '88', 'utilization.memory': '56', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 201.40, 47, N/A, 1950, 1950, 7199, 97, 17, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', ' 201.40', ' 47', ' N/A', ' 1950', ' 1950', ' 7199', ' 97', ' 17', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', '201.40', '47', 'N/A', '1950', '1950', '7199', '97', '17', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', 'power.draw': '201.40', 'temperature.gpu': '47', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1950', 'clocks.sm': '1950', 'clocks.mem': '7199', 'utilization.gpu': '97', 'utilization.memory': '17', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 96.01, 41, N/A, 1995, 1995, 7199, 99, 4, 100, 4, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-089608fe-cba5-4711-bf68-085fd0711d8c', ' 96.01', ' 41', ' N/A', ' 1995', ' 1995', ' 7199', ' 99', ' 4', ' 100', ' 4', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', '96.01', '41', 'N/A', '1995', '1995', '7199', '99', '4', '100', '4', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', 'power.draw': '96.01', 'temperature.gpu': '41', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1995', 'clocks.sm': '1995', 'clocks.mem': '7199', 'utilization.gpu': '99', 'utilization.memory': '4', 'fan.speed': '100', 'pcie.link.width.current': '4', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 174.71, 42, N/A, 1965, 1965, 7199, 88, 56, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7979', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-22b2c6ac-2d49-4863-197c-9c469071178a', ' 174.71', ' 42', ' N/A', ' 1965', ' 1965', ' 7199', ' 88', ' 56', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7979', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', '174.71', '42', 'N/A', '1965', '1965', '7199', '88', '56', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7979', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', 'power.draw': '174.71', 'temperature.gpu': '42', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1965', 'clocks.sm': '1965', 'clocks.mem': '7199', 'utilization.gpu': '88', 'utilization.memory': '56', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 201.40, 47, N/A, 1965, 1965, 7199, 97, 17, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', ' 201.40', ' 47', ' N/A', ' 1965', ' 1965', ' 7199', ' 97', ' 17', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', '201.40', '47', 'N/A', '1965', '1965', '7199', '97', '17', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', 'power.draw': '201.40', 'temperature.gpu': '47', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1965', 'clocks.sm': '1965', 'clocks.mem': '7199', 'utilization.gpu': '97', 'utilization.memory': '17', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None
Ricks-Lab commented 4 years ago

I have pushed a version with basic amdgpu-ls functionality. Still needs a lot of work. Let me know if it works on your system.

KeithMyers commented 4 years ago

No readings in this one. Is that what you wanted?

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Error getting p-states: /sys/class/drm/card0/device/pp_od_clk_voltage
Error getting p-states: /sys/class/drm/card1/device/pp_od_clk_voltage
Error getting p-states: /sys/class/drm/card2/device/pp_od_clk_voltage
Card Number: 0
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   PCIe ID: 08:00.0
   Driver: 440.64
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0

Card Number: 1
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   PCIe ID: 0a:00.0
   Driver: 440.64
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0

Card Number: 2
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   PCIe ID: 0b:00.0
   Driver: 440.64
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
Ricks-Lab commented 4 years ago

I pushed a new version. It was still checking an AMD specific file and reset readable to False. Hopefully this will work. Will need major refactoring afterward.

KeithMyers commented 4 years ago
 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: 4
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): 164.400
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): 82
   Current Memory Loading (%): 23
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): 7982
   Current  Temps (C): {'temperature.gpu': 55.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.current.graphics': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Current MCLK P-State: [2, '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): 166.300
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): 98
   Current Memory Loading (%): 42
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): 7979
   Current  Temps (C): {'temperature.gpu': 44.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.current.graphics': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Current MCLK P-State: [2, '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): 200.800
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): 97
   Current Memory Loading (%): 8
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): 7982
   Current  Temps (C): {'temperature.gpu': 52.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.current.graphics': 1965.0, 'clocks.mem': 7199.0, 'clocks.sm': 1965.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Current MCLK P-State: [2, '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None
Ricks-Lab commented 4 years ago

I just pushed a version that reads more items and skips those not applicable in the amdgpu-ls output. Still may be able to implement clock ranges, but need to know which is most relevant. Please post output.

KeithMyers commented 4 years ago

OK, here is your latest.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: None
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 145, in main
    gpu_list.print(short=args.short, clflag=args.clinfo)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 2023, in print
    gpu.print(short=short, clflag=clflag)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1366, in print
    if isinstance(self.get_params_value(k), float):
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 612, in get_params_value
    return self.prm[name]
KeyError: 'frequencies_max'
Ricks-Lab commented 4 years ago

just pushed a fix

Ricks-Lab commented 4 years ago

not sure why the params are all None before the syntax error. Maybe something when wrong with the query string...

Can you try the command that is printed before the utility output:

/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
KeithMyers commented 4 years ago

/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits Field "clocks.max.video" is not a valid field to query.

KeithMyers commented 4 years ago

Removed the invalid clocks.max.video

/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 169.38, 51, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 84, 16, 305, 100, [N/A], 4, 2, P2
Ricks-Lab commented 4 years ago

I just pushed a fix.

KeithMyers commented 4 years ago
./amdgpu-ls
Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 08:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0

Card Number: 1
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0a:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0

Card Number: 2
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0b:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
Ricks-Lab commented 4 years ago

Are you sure that is from extended branch?

Ricks-Lab commented 4 years ago

If you want to git clone of the repo, you will need to do git checkout extended after you clone while in the project directory. Then you can do a git pull for the latest.

KeithMyers commented 4 years ago

No, the page refreshed back to master and I didn't notice. Will test again.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 129.79, 46, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 85, 4, 283, 100, [N/A], 4, 2, P2', '']]
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 98, in main
    gpu_list.set_gpu_list(clinfo_flag=True)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1788, in set_gpu_list
    self[gpu_uuid].read_gpu_sensor_set_nv()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1229, in read_gpu_sensor_set_nv
    mem_value = int(results[param_name]) if results[param_name].isnumeric else None
KeyError: 'mem_vram_total'
Ricks-Lab commented 4 years ago

Just pushed a fix.

KeithMyers commented 4 years ago
./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 185.95, 56, N/A, 1980, 1980, 7199, 1830, 2160, 2160, 7000, 89, 22, 325, 100, [N/A], 4, 2, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 170.53, 45, N/A, 1965, 1965, 7199, 1815, 2160, 2160, 7000, 88, 57, 3728, 100, [N/A], 8, 3, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 106.01, 41, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 100, 4, 908, 100, [N/A], 8, 3, P2', '']]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 202.65, 56, N/A, 1980, 1980, 7199, 1830, 2160, 2160, 7000, 89, 22, 325, 100, [N/A], 4, 2, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 170.53, 45, N/A, 1965, 1965, 7199, 1815, 2160, 2160, 7000, 88, 57, 3728, 100, [N/A], 8, 3, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 101.62, 41, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 100, 4, 908, 100, [N/A], 8, 3, P2', '']]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 08:00.0
      Link Speed: 2
      Link Width: 4
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): 202.700
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 89
   Current Memory Loading (%): 22
   Current VRAM Usage (%): 4.072
      Current VRAM Used (GB): 0.317
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 56.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.gr': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0, 'clocks.video': 1830.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Power Profile Mode: [N/A]
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 0a:00.0
      Link Speed: 3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): 170.500
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 88
   Current Memory Loading (%): 57
   Current VRAM Usage (%): 46.723
      Current VRAM Used (GB): 3.641
      Total VRAM (GB): 7.792
   Current  Temps (C): {'temperature.gpu': 45.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.gr': 1965.0, 'clocks.mem': 7199.0, 'clocks.sm': 1965.0, 'clocks.video': 1815.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Power Profile Mode: [N/A]
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 0b:00.0
      Link Speed: 3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): 101.600
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 100
   Current Memory Loading (%): 4
   Current VRAM Usage (%): 11.376
      Current VRAM Used (GB): 0.887
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 41.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.gr': 1995.0, 'clocks.mem': 7199.0, 'clocks.sm': 1995.0, 'clocks.video': 1845.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Power Profile Mode: [N/A]
   Power DPM Force Performance Level: None
Ricks-Lab commented 4 years ago

Can you try amdgpu-monitor. If it looks good, then try with --gui option. Trying it with --plot option would be pushing it. You need to make sure you have loaded requirements as defined in the UsersGuide.