Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
142 stars 23 forks source link

can't run modules #4

Closed csecht closed 5 years ago

csecht commented 5 years ago

On Lubuntu 18.04 with the AMDGPU 18.5 All-Open (Mesa) drivers package, I can't run any of the amdgpu-utils modules. They all fail to launch, like this:

~/Desktop/amdgpu-utils-master$ ./amdgpu-ls AMD Wattman features enabled: 0xffff7fff Traceback (most recent call last): File "./amdgpu-ls", line 136, in main() File "./amdgpu-ls", line 94, in main gut_const.get_amd_driver_version() File "/home/craig/Desktop/amdgpu-utils-master/GPUmodules/GPUmodules.py", line 101, in get_amd_driver_version stderr=subprocess.DEVNULL).decode().split("\n") File "/usr/lib/python3.6/subprocess.py", line 336, in check_output **kwargs).stdout File "/usr/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['dpkg', '-l', 'amdgpu-pro']' returned non-zero exit status 1.

Do you have any suggestions for what I might be doing wrong?

Ricks-Lab commented 5 years ago

Can you run dpkg -l amdgpu-pro on the command line and let me know the output? Also try without the '-pro'. Thanks!

Ricks-Lab commented 5 years ago

I used try for the execution of dpkg to avoid the error, but really need to understand why your system exits dpkg with bad status. There is probably a bigger issue of different distributions using a different command to check version of amdgpu drivers, but I thought I avoided that by not running dpkg if it is not found. Please give the latest on master a try and let me know of any issues. Thanks!

csecht commented 5 years ago

Thanks for the quick reply!

This is what I get:

craig@linux-GA-MA790X-UD4:~$ dpkg -l amdgpu-pro dpkg-query: no packages found matching amdgpu-pro

craig@linux-GA-MA790X-UD4:~$ dpkg -l amdgpu Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-================================= ii amdgpu 18.50-708488 amd64 Meta package to install amdgpu co

Cheers, Craig

On Feb 19, 2019, at 5:11 PM, Rick notifications@github.com wrote:

Can you run dpkg -l amdgpu-pro on the command line and let me know the output? Also try without the '-pro'. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465350159, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQrAeqApVB-gZYI8g3Oqkzf6T0kxKks5vPISdgaJpZM4bD5TF.

csecht commented 5 years ago

Okay, I loaded the latest Master and get the same dpkg error as before. —Craig

On Feb 19, 2019, at 5:22 PM, Rick notifications@github.com wrote:

I used try for the execution of dpkg to avoid the error, but really need to understand why your system exits dpkg with bad status. There is probably a bigger issue of different distributions using a different command to check version of amdgpu drivers, but I thought I avoided that by not running dpkg if it is not found. Please give the latest on master a try and let me know of any issues. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465353246, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQlbSGKEy-gZnsAJadZAtt0t5G2bmks5vPIdGgaJpZM4bD5TF.

Ricks-Lab commented 5 years ago

My except statement was too specific. I have just modified, but could not test as I only have access with my phone. Can you give it another try?

Ricks-Lab commented 5 years ago

Thanks for providing the output of dpkg -l. My original approach was to look for amdgpu-pro drivers. I will modify to check for both pro and normal version of amdgpu drivers. Should be an easy fix. In the meantime, the above modification should work.

csecht commented 5 years ago

Progress?

craig@linux-GA-MA790X-UD4:~/Desktop/amdgpu-utils-master$ ./amdgpu-ls AMD Wattman features enabled: 0xffff7fff Warning: Cannot read determine amdgpu version. 2 AMD GPUs detected Traceback (most recent call last): File "./amdgpu-ls", line 136, in main() File "./amdgpu-ls", line 111, in main gpu_list.get_gpu_details() File "/home/craig/Desktop/amdgpu-utils-master/GPUmodules/GPUmodules.py", line 474, in get_gpu_details gpu_name = lspci_items[1].split('[AMD/ATI]')[1] IndexError: list index out of range

On Feb 19, 2019, at 6:53 PM, Rick notifications@github.com wrote:

My except statement was too specific. I have just modified, but could not test as I only have access with my phone. Can you give it another try?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465373671, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQhB694T-Q0QpI9jhA5Yjv63EnuCVks5vPJyegaJpZM4bD5TF.

csecht commented 5 years ago

Given the index error when I run amdgpu-ls:

craig@linux-GA-MA790X-UD4:~/Desktop/amdgpu-utils-master$ ./amdgpu-ls

AMD Wattman features enabled: 0xffff7fff Warning: Cannot read determine amdgpu version. 2 AMD GPUs detected Traceback (most recent call last): File "./amdgpu-ls", line 136, in main() File "./amdgpu-ls", line 111, in main gpu_list.get_gpu_details() File "/home/craig/Desktop/amdgpu-utils-master/GPUmodules/GPUmodules.py", line 474, in get_gpu_details gpu_name = lspci_items[1].split('[AMD/ATI]')[1] IndexError: list index out of range

I ran lspci so you can see what it’s feeding the program:

craig@linux-GA-MA790X-UD4:~/Desktop/amdgpu-utils-master$ lspci

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD780 Host Bridge 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RX780/RD790 PCI to PCI bridge (external gfx0 port A) 00:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD790 PCI to PCI bridge (external gfx0 port B) 00:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD790 PCI to PCI bridge (PCI express gpp port F) 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode] 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller 00:12.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller 00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller 00:13.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 3a) 00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 IDE Controller 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller 00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge 00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev ef) 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460] (rev cf) 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aae0 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02) 04:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

I’ll check back on GitHub tomorrow. Before I forget though, I got interested in amdgpu-utils from a posting by Keith Myers on an Einstein@Home discussion; he recommended it for folks trying out the new Radeon Vega VII, but I want to use it for a couple of older Radeon RX cards that I run in an E@H host.

Cheers, Craig

On Feb 19, 2019, at 6:53 PM, Rick notifications@github.com wrote:

My except statement was too specific. I have just modified, but could not test as I only have access with my phone. Can you give it another try?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465373671, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQhB694T-Q0QpI9jhA5Yjv63EnuCVks5vPJyegaJpZM4bD5TF.

Ricks-Lab commented 5 years ago

Hi Craig, I added some error resistance, but I think it will end up saying the model of your GPUs is unknown. It looks like the format of lspci is different for your cards vs. what I have been using during development (RX Vega64). Please give the latest on master a try. Also, I have added some debug statements to print what the 2 calls to lspci results in. Please let me know the output when running amdgpu-ls.

Really appreciate you getting involved. My one seti cruncher is not enough to know the code is robust. I tried to get someone onboard with testing at SETI, but most linux users aren't using amd cards.

csecht commented 5 years ago

Sweet! Everything is working (haven’t tried ampgpu-pac yet). Yes, the cards are listed as UNKNOWN, but the model names are buried in the lspci output (card1 is a RX 460, card0 is a RX 570)

Here are the -ls debug statements and output:

craig@linux-GA-MA790X-UD4:~/Desktop/amdgpu-utils-master$ ./amdgpu-ls AMD Wattman features enabled: 0xffff7fff amdgpu version: 18.50-708488 2 AMD GPUs detected Found 2 GPUs GPU: 01:00.0 ['01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev ef)', '\tSubsystem: XFX Pine Group Inc. Ellesmere [Radeon RX 470/480/570/580]', '\tKernel driver in use: amdgpu', '\tKernel modules: amdgpu', ''] device_dir: /sys/class/drm/card1/device sysfspath: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0 pcie_id: 01:00.0 sysfspath-7: 02:00.0 device_dir: /sys/class/drm/card0/device sysfspath: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0 pcie_id: 01:00.0 sysfspath-7: 01:00.0 GPU: 02:00.0 ['02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460] (rev cf)', '\tSubsystem: Hewlett-Packard Company Baffin [Radeon RX 460/560D / Pro 450/455/460/560]', '\tKernel driver in use: amdgpu', '\tKernel modules: amdgpu', ''] device_dir: /sys/class/drm/card1/device sysfspath: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0 pcie_id: 02:00.0 sysfspath-7: 02:00.0 2 are Compatible

UUID: f7c09038eca44ea9ac18875b56f773fe Card Model: UNKNOWN Card Number: 1 Card Path: /sys/class/drm/card1/device/ PCIe ID: 02:00.0 Driver: amdgpu HWmon: /sys/class/drm/card1/device/hwmon/hwmon1/ Current Power (W): 47.224 Power Cap (W): 48.0 Min Power Cap (W): 0.0 Max Power Cap (W): 48.0 Current Temp (C): 75.0 Current VddGFX (mV): 1037 Vddc Range: ['800mV', '1150mV'] Current Loading (%): 98 Link Speed: 5 GT/s Link Width: 8 vBIOS Version: 113-AB70140-005 Current SCLK P-State: 5 Current SCLK: 1138Mhz SCLK Range: ['214MHz', '1800MHz'] Current MCLK P-State: 1 Current MCLK: 1750Mhz MCLK Range: ['300MHz', '2000MHz'] Power Performance Mode: 0-3D_FULL_SCREEN Power Force Performance Level: auto

UUID: 8be14a0abf60440b89b83d9b8d7e98c1 Card Model: UNKNOWN Card Number: 0 Card Path: /sys/class/drm/card0/device/ PCIe ID: 01:00.0 Driver: amdgpu HWmon: /sys/class/drm/card0/device/hwmon/hwmon0/ Current Power (W): 117.073 Power Cap (W): 125.0 Min Power Cap (W): 0.0 Max Power Cap (W): 125.0 Current Temp (C): 78.0 Current VddGFX (mV): 1075 Vddc Range: ['750mV', '1150mV'] Current Loading (%): 100 Link Speed: 5 GT/s Link Width: 8 vBIOS Version: 113-57045EHB1-W90 Current SCLK P-State: 7 Current SCLK: 1286Mhz SCLK Range: ['300MHz', '2000MHz'] Current MCLK P-State: 2 Current MCLK: 1750Mhz MCLK Range: ['300MHz', '2250MHz'] Power Performance Mode: 0-3D_FULL_SCREEN Power Force Performance Level: auto

++++++++++++++++++++++++++++++++++=

amdgpu-ls —pstate (without debug statements)

Card: /sys/class/drm/card1/device/ SCLK: MCLK: 0: 214MHz 800mV 0: 300MHz 800mV
1: 481MHz 821mV 1: 1750MHz 850mV
2: 760MHz 825mV
3: 1020MHz 925mV
4: 1102MHz 1012mV
5: 1138MHz 1056mV
6: 1172MHz 1100mV
7: 1200MHz 1143mV

Card: /sys/class/drm/card0/device/ SCLK: MCLK: 0: 300MHz 750mV 0: 300MHz 750mV
1: 588MHz 765mV 1: 1000MHz 800mV
2: 952MHz 918mV 2: 1750MHz 900mV
3: 1076MHz 1025mV
4: 1143MHz 1087mV
5: 1208MHz 1150mV
6: 1250MHz 1150mV
7: 1286MHz 1150mV
+++++++++++++++++++++++++++++++++++

amdgpu-ls —ppm (w/o debugs statements)

Card: /sys/class/drm/card1/device/ Power Performance Mode: auto NUM: MODE_NAME SCLK_UP_HYST SCLK_DOWN_HYST SCLK_ACTIVE_LEVEL MCLK_UP_HYST MCLK_DOWN_HYST MCLK_ACTIVE_LEVEL 0: 3D_FULL_SCREEN 0 100 30 0 100 10 1: POWER_SAVING 10 0 30 - - - 2: VIDEO - - - 10 16 31 3: VR 0 11 50 0 100 10 4: COMPUTE 0 5 30 - - - 5: CUSTOM - - - - - - -1: AUTO Auto

Card: /sys/class/drm/card0/device/ Power Performance Mode: auto NUM: MODE_NAME SCLK_UP_HYST SCLK_DOWN_HYST SCLK_ACTIVE_LEVEL MCLK_UP_HYST MCLK_DOWN_HYST MCLK_ACTIVE_LEVEL 0: 3D_FULL_SCREEN 0 100 30 0 100 10 1: POWER_SAVING 10 0 30 - - - 2: VIDEO - - - 10 16 31 3: VR 0 11 50 0 100 10 4: COMPUTE 0 5 30 - - - 5: CUSTOM - - - - - - -1: AUTO Auto

++++++++++++++++++++++++++++

And amdgpu-monitor works, and also with the —gui option. That’s a nice feature.

++++++++++++++++++++++++++++

I’ve had good success with these cards in Windows 7 using AMD’s Global Wattman to power limit. How would I power limit using amdgpu-pac? Thanks much for updating your program to work with my system.

Cheers, Craig

On Feb 20, 2019, at 7:54 AM, Rick notifications@github.com wrote:

Hi Craig, I added some error resistance, but I think it will end up saying the model of your GPUs is unknown. It looks like the format of lspci is different for your cards vs. what I have been using during development (RX Vega64). Please give the latest on master a try. Also, I have added some debug statements to print what the 2 calls to lspci results in. Please let me know the output when running amdgpu-ls.

Really appreciate you getting involved. My one seti cruncher is not enough to know the code is robust. I tried to get someone onboard with testing at SETI, but most linux users aren't using amd cards.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465583269, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQkc7fV-PWqTGyEQF_GUPzMl_5z77ks5vPVOLgaJpZM4bD5TF.

csecht commented 5 years ago

Rick, Following up from my last email, I ran amdgpu-pac and got it to power limit both cards! I wasn’t able to do that (or figure out how to do that) with either rocm-smi or ohgodatool.

I had an odd experience when at first I tried to set the performance state from AUTO to COMPUTE with the amdgpu-pac: it made the change, but the sclk and mclk states were all changed to 0, although the GPUs kept grinding away at the Einstein@Home tasks at what seemed like their normal pace. I rebooted, reset the cards with amdgpu-pac, and then tried power limiting with success.

A feature I didn’t find in amdgpu-utils that I like in AMD’s rocm-smi is the ability to monitor and set fan RPM and %max. Having that function in amdgpu-utils would make it do everything that Wattman does.

Cheers, Craig

Ricks-Lab commented 5 years ago

Thanks for the debug output. I have found the problem. For Vega64, the preferred GPU name is on the second line of the lspci -k -s command, but for your cards, the name is only given on the first line. I have improved the logic, so the name is extracted in both cases. Do you know if the Radeon VII users are having any issues?

When you run amdgpu-pac are you using the --execute_pac option? I made this an option since I figured new users would not be comfortable running a bash script with sudo commands. I do plan to improve by not writing out commands of parameters that don't change, but needed to find time.

For the issue where the card condition was corrupted, I have not seen it, but I will check out the implementation in ROCm-smi to see if I am missing a better way of doing things.

When you have a chance, check out the latest on master and let me know if the card name is displayed correctly.

Ricks-Lab commented 5 years ago

Hi Craig, Since I am developing this on a system where all GPUs are on waterblocks, fan settings wasn't a priority. It should be easy to add, but I would need your help to test/verify. Can you open another issue for this feature request? That will make it easier to track testing and verification. Thanks! Rick

csecht commented 5 years ago

Yes, the card names are being read (below), but the rx 570 card is coming up as rx 470/480, which is what also happens with rocm-smi or ohgodatool. Both series of cards have the Ellesmere chip, so I don’t know if the 470 vs 570 name is an AMD Radeon thing or a card BIOS thing (my card is made by XFX). In Windows, AMD’s Global WattMan and GPU-Z call it an RX570. RX 570 also shows up in ‘clinfo’ (below; see Device Board Name (AMD)). So maybe query clinfo output? Though, I suppose you’d first need to know whether RX570 cards from other manufacturers also do the odd naming thing.

I haven’t heard yet of any E@H users having issues with the Vega VII. I don’t think anyone has tried tweaking its performance yet.

No, I haven’t used the —execute_pac option yet. I know what you mean about novice users (I’m pretty much at that stage) and it’s a nice option, but I was comfortable with sudo. I actually like having the shell script hanging around in case I need to use it again, like after a system reboot; I renamed each card's script with something descriptive, assuming it'll still work if run again.

Thanks much for developing this utility. I had temporarily given up on Linux for E@H because my particular setup just ran too hot under default card settings. But now I’m back with Linux and the cards are running E@H very efficiently thanks to amdgpu-utils.

Cheers

++++++++++++++++++++++++++++++++++++ $ ./amdgpu-ls AMD Wattman features enabled: 0xffff7fff amdgpu version: 18.50-708488 2 AMD GPUs detected 2 are Compatible

UUID: a028c1b44a954375ae3697c0e1c102b6 Card Model: Baffin [Radeon RX 460] (rev cf) Short Card Model: RX 460 Card Number: 1 Card Path: /sys/class/drm/card1/device/ PCIe ID: 02:00.0 Driver: amdgpu HWmon: /sys/class/drm/card1/device/hwmon/hwmon1/ Current Power (W): 42.189 Power Cap (W): 43.0 Min Power Cap (W): 0.0 Max Power Cap (W): 48.0 Current Temp (C): 72.0 Current VddGFX (mV): 900 Vddc Range: ['800mV', '1150mV'] Current Loading (%): 100 Link Speed: 5 GT/s Link Width: 8 vBIOS Version: 113-AB70140-005 Current SCLK P-State: 3 Current SCLK: 1020Mhz SCLK Range: ['214MHz', '1800MHz'] Current MCLK P-State: 1 Current MCLK: 1750Mhz MCLK Range: ['300MHz', '2000MHz'] Power Performance Mode: 0-3D_FULL_SCREEN Power Force Performance Level: auto

UUID: 098a453668f54f108acf10deccac13a0 Card Model: Ellesmere [Radeon RX 470/480] (rev ef) Short Card Model: RX 470/480 Card Number: 0 Card Path: /sys/class/drm/card0/device/ PCIe ID: 01:00.0 Driver: amdgpu HWmon: /sys/class/drm/card0/device/hwmon/hwmon0/ Current Power (W): 100.134 Power Cap (W): 100.0 Min Power Cap (W): 0.0 Max Power Cap (W): 125.0 Current Temp (C): 76.0 Current VddGFX (mV): 1000 Vddc Range: ['750mV', '1150mV'] Current Loading (%): 100 Link Speed: 5 GT/s Link Width: 8 vBIOS Version: 113-57045EHB1-W90 Current SCLK P-State: 5 Current SCLK: 1208Mhz SCLK Range: ['300MHz', '2000MHz'] Current MCLK P-State: 2 Current MCLK: 1750Mhz MCLK Range: ['300MHz', '2250MHz'] Power Performance Mode: 0-3D_FULL_SCREEN Power Force Performance Level: auto

+++++++++++++++++++++++++++++== $ clinfo Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.1 AMD-APP (2766.4) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Host timer resolution 1ns Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing Number of devices 2 Device Name Ellesmere Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 1.2 AMD-APP (2766.4) Driver Version 2766.4 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Board Name (AMD) Radeon RX 570 Series Device Topology (AMD) PCI-E, 01:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 32 SIMD per compute unit (AMD) 4 SIMD width (AMD) 16 SIMD instruction width (AMD) 1 Max clock frequency 1286MHz Graphics IP (AMD) 8.0 Device Partition (core) Max number of sub-devices 32 Supported partition types None Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 256 Preferred work group size (AMD) 256 Max work group size (AMD) 1024 Preferred work group size multiple 64 Wavefront width (AMD) 64 Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16) float 1 / 1
double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 2801606656 (2.609GiB) Global free memory (AMD) 2714816 (2.589GiB) Global memory channels (AMD) 8 Global memory banks per channel (AMD) 16 Global memory bank width (AMD) 256 bytes Error Correction support No Max memory allocation 2164253081 (2.016GiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 2048 bits (256 bytes) Global Memory cache type Read/Write Global Memory cache size 16384 (16KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 134217728 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 256 bytes Pitch alignment for 2D image buffers 256 pixels Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Local Local memory size 32768 (32KiB) Local memory syze per CU (AMD) 65536 (64KiB) Local memory banks (AMD) 32 Max number of constant args 8 Max constant buffer size 2164253081 (2.016GiB) Preferred constant buffer size (AMD) 16384 (16KiB) Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 1ns Profiling timer offset since Epoch (AMD) 1550691693782334890ns (Wed Feb 20 13:41:33 2019) Execution capabilities
Run OpenCL kernels Yes Run native kernels No Thread trace supported (AMD) Yes Number of async queues (AMD) 2 Max real-time compute queues (AMD) 0 Max real-time compute units (AMD) 0 SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Device Name Baffin Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 1.2 AMD-APP (2766.4) Driver Version 2766.4 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Board Name (AMD) AMD Radeon (TM) RX 460 Graphics Device Topology (AMD) PCI-E, 02:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 16 SIMD per compute unit (AMD) 4 SIMD width (AMD) 16 SIMD instruction width (AMD) 1 Max clock frequency 1200MHz Graphics IP (AMD) 8.0 Device Partition (core) Max number of sub-devices 16 Supported partition types None Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 256 Preferred work group size (AMD) 256 Max work group size (AMD) 1024 Preferred work group size multiple 64 Wavefront width (AMD) 64 Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16) float 1 / 1
double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 683909120 (652.2MiB) Global free memory (AMD) 648296 (633.1MiB) Global memory channels (AMD) 4 Global memory banks per channel (AMD) 16 Global memory bank width (AMD) 256 bytes Error Correction support No Max memory allocation 358138265 (341.5MiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 2048 bits (256 bytes) Global Memory cache type Read/Write Global Memory cache size 16384 (16KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 134217728 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 256 bytes Pitch alignment for 2D image buffers 256 pixels Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Local Local memory size 32768 (32KiB) Local memory syze per CU (AMD) 65536 (64KiB) Local memory banks (AMD) 32 Max number of constant args 8 Max constant buffer size 358138265 (341.5MiB) Preferred constant buffer size (AMD) 16384 (16KiB) Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 1ns Profiling timer offset since Epoch (AMD) 1550691693782334890ns (Wed Feb 20 13:41:33 2019) Execution capabilities
Run OpenCL kernels Yes Run native kernels No Thread trace supported (AMD) Yes Number of async queues (AMD) 2 Max real-time compute queues (AMD) 0 Max real-time compute units (AMD) 2583484461 SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [AMD] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name Ellesmere clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2) Platform Name AMD Accelerated Parallel Processing Device Name Ellesmere Device Name Baffin clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2) Platform Name AMD Accelerated Parallel Processing Device Name Ellesmere Device Name Baffin ++++++++++++++++++++++++++++++++++

On Feb 20, 2019, at 5:18 PM, Rick notifications@github.com wrote:

Thanks for the debug output. I have found the problem. For Vega64, the preferred GPU name is on the second line of the lspci -k -s command, but for your cards, the name is only given on the first line. I have improved the logic, so the name is extracted in both cases. Do you know if the Radeon VII users are having any issues?

When you run amdgpu-pac are you using the --execute_pac option? I made this an option since I figured new users would not be comfortable running a bash script with sudo commands. I do plan to improve by not writing out commands of parameters that don't change, but needed to find time.

For the issue where the card condition was corrupted, I have not seen it, but I will check out the implementation in ROCm-smi to see if I am missing a better way of doing things.

When you have a chance, check out the latest on master and let me know if the card name is displayed correctly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465794981, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQkMTRPu4In7qQ5DeyBiEGfNNoSSpks5vPde1gaJpZM4bD5TF.

csecht commented 5 years ago

Hi Rick, I’d be glad to help with testing and verifying fan settings. I’ll open that as a new issue. Cheers, Craig

On Feb 20, 2019, at 5:21 PM, Rick notifications@github.com wrote:

Hi Craig, Since I am developing this on a system where all GPUs are on waterblocks, fan settings wasn't a priority. It should be easy to add, but I would need your help to test/verify. Can you open another issue for this feature request? That will make it easier to track testing and verification. Thanks! Rick

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/4#issuecomment-465795799, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQj1Yv9GPW5GR59iAZ2zH3h0F9Xjaks5vPdh2gaJpZM4bD5TF.

KeithMyers commented 5 years ago

Rick did you get any bites from MilkyWay AMD users yet?

Ricks-Lab commented 5 years ago

Hi Keith, I see there are some views from MilkyWay, but no direct interactions. You can tell from the engagement with Craig here that there were significant issues that were not detected in my test setup. Perhaps with the fixes, people may find it more useful. I plan an official v2.0.0 release within the next day. Just getting things in order. I will announce on SETI when the release is complete.

Ricks-Lab commented 5 years ago

Hi Craig, Not sure where clinfo is getting a more accurate name, but I don't want the installation of clinfo be required to use these tools. Currently it is only required by amdgpu-ls when using the --clinfo option. I plan to study the ROCm-smi implementation for insight to improve the approach in this tool. Let me know of any concerns in closing this issue. Thanks for your support in the debug process!

Ricks-Lab commented 5 years ago

Fixed in v.2.0.0.