Closed csecht closed 4 years ago
Updating the User Guide is definitely a good idea. Let me know of any issues making pull requests. I will answer each concern is a separate response.
For grub updates, I found that the updates were not consistently effective without a reboot. amdgpu-utils reads the featuremask from the first line of the 'ppfeatures' file in the cards device directory. It is only read when a utility if first executed.
The ppm table changes between generations of GPUs. Here is what it looks like for Vega20:
PROFILE_INDEX(NAME) CLOCK_TYPE(NAME) FPS UseRlcBusy MinActiveFreqType MinActiveFreq BoosterFreqType BoosterFreq PD_Data_limit_c PD_Data_error_coeff PD_Data_error_rate_coeff
0 BOOTUP_DEFAULT :
0( GFXCLK) 0 0 1 0 4 800 4587520 -65536 0
1( SOCCLK) 0 0 1 0 4 800 327680 -6553 0
2( UCLK) 0 0 1 0 4 800 327680 -65536 0
3( FCLK) 0 0 0 0 4 800 327680 -6553 0
1 3D_FULL_SCREEN :
0( GFXCLK) 0 1 1 0 4 800 4587520 -65536 0
1( SOCCLK) 0 1 4 850 4 800 327680 -65536 0
2( UCLK) 0 1 4 850 4 800 327680 -65536 0
3( FCLK) 0 1 4 850 4 800 327680 -65536 0
2 POWER_SAVING :
0( GFXCLK) 0 0 1 0 3 0 5898240 -65536 0
1( SOCCLK) 0 0 1 0 3 0 1310720 -6553 0
2( UCLK) 0 0 1 0 3 0 1966080 -65536 0
3( FCLK) 0 0 0 0 3 800 1966080 -6553 0
3 VIDEO :
0( GFXCLK) 0 1 1 0 4 500 4587520 -6553 0
1( SOCCLK) 0 0 1 0 4 500 1310720 -6553 0
2( UCLK) 0 0 1 0 4 500 1966080 -65536 0
3( FCLK) 0 0 3 0 4 500 1966080 -6553 0
4 VR :
0( GFXCLK) 0 1 0 1540 4 800 5898240 -6553 65536
1( SOCCLK) 0 1 2 0 4 800 327680 -32768 -65536
2( UCLK) 0 1 2 0 4 800 327680 -32768 -65536
3( FCLK) 0 1 2 0 4 800 327680 -32768 -65536
5 COMPUTE*:
0( GFXCLK) 0 1 0 1600 3 0 3932160 -65536 -65536
1( SOCCLK) 0 0 4 850 3 0 327680 -65536 -32768
2( UCLK) 0 0 4 850 3 0 327680 -65536 -32768
3( FCLK) 0 0 4 850 3 0 327680 -65536 -32768
6 CUSTOM :
0( GFXCLK) 0 0 1 0 4 800 4587520 -65536 0
1( SOCCLK) 0 0 1 0 4 800 327680 -6553 0
2( UCLK) 0 0 1 0 4 800 327680 -65536 0
3( FCLK) 0 0 0 0 4 800 327680 -6553 0
Not sure of the best approach, but since amdgpu-utils does not include the ability to manage details of the table, maybe a simplified version is best. Perhaps, I could develop an option to display the entire contents of the pp_power_profile_mode device file.
For the p-state masks, I don't think it is possible to indicate what the current mask is set to. I can only show the default mask. The p-state definition section will display current values of Freq and Voltage, but the mask is a different issue.
I don't think there is a reason to believe that Navi cards won't be supported. Perhaps there is a delay in kernel or driver full functionality, but I tried a Vega20 soon after it was released, most functionality was available.
Not sure of the best approach, but since amdgpu-utils does not include the ability to manage details of the table, maybe a simplified version is best. Perhaps, I could develop an option to display the entire contents of the pp_power_profile_mode device file.
I have pushed a new version that displays the entire contents of ppm after a brief summary. Let me know what you think.
I have pushed a new version that displays the entire contents of ppm after a brief summary. Let me know what you think.
The new --ppm output format is nice, but repeating the mode table seems redundant? (below) Or are the two tables the two possible output options? I like how the asterix denotes the current mode. Did you mean to omit the AUTO line from the full table?
Linux2:~/Desktop/amdgpu-utils$ ./amdgpu-ls --ppm
Card Number: 1
Card Model: Radeon RX 570
Card: /sys/class/drm/card1/device
Power Performance Mode: manual
0: BOOTUP_DEFAULT
1: 3D_FULL_SCREEN
2: POWER_SAVING
3: VIDEO
4: VR
5: COMPUTE
6: CUSTOM
-1: AUTO
NUM MODE_NAME SCLK_UP_HYST SCLK_DOWN_HYST SCLK_ACTIVE_LEVEL MCLK_UP_HYST MCLK_DOWN_HYST MCLK_ACTIVE_LEVEL
0 BOOTUP_DEFAULT: - - - - - -
1 3D_FULL_SCREEN: 0 100 30 0 100 10
2 POWER_SAVING: 10 0 30 - - -
3 VIDEO: - - - 10 16 31
4 VR: 0 11 50 0 100 10
5 COMPUTE *: 0 5 30 0 100 10
6 CUSTOM: - - - - - -
For amdgpu-monitor, because Mem Load % is now listed, perhaps change Load % to GPU Load % in the output? I don't see a need to change the Guide text of the "Using amdgpu-monitor" section, but should I update those Guide graphics with my RX 570 cards or do you want to update them with your Vega cards? (I think that the alternate outputs shown for your Vega cards would be more informative.)
My Vega64 cards are down. I only have Quad-Fiji and single Vega20 running. Perhaps your 2 cards are a better example for the community.
I have pushed the format changes you suggested, plus a change for consistent labeling: fine tune print/monitor formats
Here are some ideas for edits to the User Guide. Let me know what you think and I can include them for a pull request. In "Getting Started" section, it says: After saving, update grub:
and then reboot.
But, after updating the ppfeaturemask code in grub, I didn't have to reboot for new PAC features (e.g. overclocking) to work. However, amdgpu-ls still lists the feature mask as what is was before the grub update. Is the featuremask code reported by -ls read from the last boot record and not the current grub file? Is a reboot only necessary following update-grub for the initial loading of amdgpu.ppfeaturemask? This is more for my clarification on how grub works than any edits to the text.
In the "Using amdgpu-ls" section, I see in the ppm graphic that the timings table were removed. If user wonder what's going on when the table of timing values is reported in their terminal, however, it may be helpful to add an explanation, unless you just want to keep less clutter in the User Guide. From the ROCm-smi page, https://github.com/RadeonOpenCompute/ROC-smi/tree/roc-2.7.0. , the column headers for the ppm timings table could be included along with brief definitions, like this:
(Text extracted and paraphrased from the ROCm-smi readme, https://github.com/RadeonOpenCompute/ROC-smi/tree/roc-2.7.0) SCLK_UP_HYST - Delay before sclk is increased (in milliseconds). SCLK_DOWN_HYST - Delay before sclk is decresed (in milliseconds). SCLK_ACTIVE_LEVEL - Workload required before sclk levels change (in %). MCLK_UP_HYST - Delay before mclk is increased (in milliseconds). MCLK_DOWN_HYST - Delay before mclk is decresed (in milliseconds). MCLK_ACTIVE_LEVEL - Workload required before mclk levels change (in %). Values displayed as '-' are hidden fields and are not enabled. When a compute queue is detected, the COMPUTE Power Profile values will be automatically applied to the system, provided that the Perf Level is set to "auto". The CUSTOM Power Profile is only applied when the Performance Level is set to "manual" and can be specified using ROCm-smi (??with rocm loaded??). It is not possible to modify non-CUSTOM Profiles because these are hard-coded by the kernel.
Maybe include this descriptive text in --ppm terminal output instead of adding it to the User Guide?
In the "Using amdgpu-monitor" section, Need to update the terminal output and GUI graphics and include descriptive text for Memory Load monitoring.
In the "Using amdgpu-pac" section, Add after, "If you know how to obtain the current value, please let me know!"... "When changing sclk P-state MHz or mV, the desired P-state mask, if different from default, will have to be re-entered for speed or voltage changes to be applied." At least this is how it has been working for me.
Need to get confirmation that ver.3.0 works with RX 5xxx-series (Navi) cards?
In the "Setting GPU Automatically at Startup" section, Change section header to "Running Startup amdgpu-pac Bash Files". (and change ToC index entry) Add instruction for setting up $HWMON variables to handle shifting hwmon# (thus increasing chances of bash files writing desired GPU parameters)? Probably don't need to use --force_write option for startup bash file; just need to the changes from default settings.