cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.97k stars 126 forks source link

Zen3 Vermeer TDC, EDC, PPT and Temp limit #299

Closed notgood closed 2 years ago

notgood commented 2 years ago

Running CoreFreq 1.88.4 on Ryzen 5950X, here is a sysinfo output:

corefreq-cli -s
Processor                                  [AMD Ryzen 9 5950X 16-Core Processor]
|- Architecture                                                   [Zen3/Vermeer]
|- Vendor ID                                                      [AuthenticAMD]
|- Firmware                                                         [ 56.53.0-2]
|- Microcode                                                        [0x0a201016]
|- Signature                                                           [  AF_21]
|- Stepping                                                            [      0]
|- Online CPU                                                          [ 32/ 32]
|- Base Clock                                                          [100.005]
|- Frequency            (MHz)                      Ratio                        
                 Min   2200.15                    <  22 >                       
                 Max   3400.23                    <  34 >                       
|- Factory                                                             [100.000]
                       3400                       [  34 ]                       
|- Performance                                                                  
   |- P-State                                                                   
                 TGT   2200.15                    <  22 >                       
|- Turbo Boost                                                         [ UNLOCK]
                 XFR   5100.35                    [  51 ]                       
                 CPB   5000.34                    [  50 ]                       
                  1C   2800.19                    <  28 >                       
                  2C   2200.15                    <  22 >                       
|- Uncore                                                              [   LOCK]
                 Min   1800.12                    [  18 ]                       
                 Max   1800.12                    [  18 ]                       
|- TDP                                                           Level [  0:0  ]
   |- Programmable                                                     [ UNLOCK]

Instruction Set Extensions                                                      
|- 3DNow!/Ext [N/N]          ADX [Y]          AES [Y]  AVX/AVX2 [Y/Y] 
|- AVX512-F     [N]    AVX512-DQ [N]  AVX512-IFMA [N]   AVX512-PF [N] 
|- AVX512-ER    [N]    AVX512-CD [N]    AVX512-BW [N]   AVX512-VL [N] 
|- AVX512-VBMI  [N] AVX512-VBMI2 [N]  AVX512-VNMI [N]  AVX512-ALG [N] 
|- AVX512-VPOP  [N] AVX512-VNNIW [N] AVX512-FMAPS [N] AVX512-VP2I [N] 
|- AVX512-BF16  [N]  BMI1/BMI2 [Y/Y]         CLWB [Y] CLFLUSH/O [Y/Y] 
|- CLAC-STAC    [Y]         CMOV [Y]    CMPXCHG8B [Y]  CMPXCHG16B [Y] 
|- F16C         [Y]          FPU [Y]         FXSR [Y]   LAHF-SAHF [Y] 
|- MMX/Ext    [Y/Y] MON/MWAITX [Y/Y]        MOVBE [Y]   PCLMULQDQ [Y] 
|- POPCNT       [Y]       RDRAND [Y]       RDSEED [Y]      RDTSCP [Y] 
|- SEP          [Y]          SHA [Y]          SSE [Y]        SSE2 [Y] 
|- SSE3         [Y]        SSSE3 [Y]  SSE4.1/4A [Y/Y]      SSE4.2 [Y] 
|- SERIALIZE    [N]      SYSCALL [Y]        RDPID [N]        UMIP [N] 

Features                                                                        
|- 1 GB Pages Support                                      1GB-PAGES   [Capable]
|- 100 MHz multiplier Control                            100MHzSteps   [Missing]
|- Advanced Configuration & Power Interface                     ACPI   [Capable]
|- Advanced Programmable Interrupt Controller                   APIC   [Capable]
|- Core Multi-Processing                                  CMP Legacy   [Capable]
|- L1 Data Cache Context ID                                  CNXT-ID   [Missing]
|- Collaborative Processor Performance Control                  CPPC   [Missing]
|- Direct Cache Access                                           DCA   [Missing]
|- Debugging Extension                                            DE   [Capable]
|- Debug Store & Precise Event Based Sampling               DS, PEBS   [Missing]
|- CPL Qualified Debug Store                                  DS-CPL   [Missing]
|- 64-Bit Debug Store                                         DTES64   [Missing]
|- Fast-String Operation                                Fast-Strings   [Capable]
|- Fused Multiply Add                                     FMA | FMA4   [Capable]
|- Hardware Lock Elision                                         HLE   [Missing]
|- Instruction Based Sampling                                    IBS   [Capable]
|- Instruction INVLPGB                                       INVLPGB   [Missing]
|- Long Mode 64 bits                                       IA64 | LM   [Capable]
|- LightWeight Profiling                                         LWP   [Missing]
|- Memory Bandwidth Enforcement                                  MBE   [Capable]
|- Machine-Check Architecture                                    MCA   [Capable]
|- Instruction MCOMMIT                                       MCOMMIT   [Missing]
|- Memory Protection Extensions                                  MPX   [Missing]
|- Model Specific Registers                                      MSR   [Capable]
|- Memory Type Range Registers                                  MTRR   [Capable]
|- No-Execute Page Protection                                     NX   [Capable]
|- OS-Enabled Ext. State Management                          OSXSAVE   [Capable]
|- Physical Address Extension                                    PAE   [Capable]
|- Page Attribute Table                                          PAT   [Capable]
|- Pending Break Enable                                          PBE   [Missing]
|- Process Context Identifiers                                  PCID   [Missing]
|- Perfmon and Debug Capability                                 PDCM   [Missing]
|- Page Global Enable                                            PGE   [Capable]
|- Page Size Extension                                           PSE   [Capable]
|- 36-bit Page Size Extension                                  PSE36   [Capable]
|- Processor Serial Number                                       PSN   [Missing]
|- Resource Director Technology/PQE                            RDT-A   [Capable]
|- Resource Director Technology/PQM                            RDT-M   [Capable]
|- Read Processor Register at User level                       RDPRU   [Capable]
|- Restricted Transactional Memory                               RTM   [Missing]
|- Safer Mode Extensions                                         SMX   [Missing]
|- Self-Snoop                                                     SS   [Missing]
|- Supervisor-Mode Access Prevention                            SMAP   [Capable]
|- Supervisor-Mode Execution Prevention                         SMEP   [Capable]
|- Time Stamp Counter                                            TSC [Invariant]
|- Time Stamp Counter Deadline                          TSC-DEADLINE   [Missing]
|- TSX Force Abort MSR Register                            TSX-ABORT   [Missing]
|- TSX Suspend Load Address Tracking                       TSX-LDTRK   [Missing]
|- User-Mode Instruction Prevention                             UMIP   [Missing]
|- Virtual Mode Extension                                        VME   [Capable]
|- Virtual Machine Extensions                                    VMX   [Missing]
|- Extended xAPIC Support                                     x2APIC   [  xAPIC]
|- XSAVE/XSTOR States                                          XSAVE   [Capable]
|- xTPR Update Control                                          xTPR   [Missing]
Mitigation mechanisms                                                           
|- Indirect Branch Restricted Speculation                       IBRS   [Capable]
|- Indirect Branch Prediction Barrier                           IBPB   [Capable]
|- Single Thread Indirect Branch Predictor                     STIBP   [Capable]
|- Speculative Store Bypass Disable                             SSBD   [Capable]
|- Architectural - Predictive Store Forwarding                  PSFD   [Capable]

Technologies                                                                    
|- Data Cache Unit                                                              
   |- L1 Prefetcher                                                L1 HW   < ON>
   |- L2 Prefetcher                                                L2 HW   < ON>
|- System Management Mode                                       SMM-Lock   [ ON]
|- Simultaneous Multithreading                                       SMT   [ ON]
|- PowerNow!                                                         CnQ   [OFF]
|- Core C-States                                                     CCx   [ ON]
|- Core Performance Boost                                            CPB   < ON>
|- Watchdog Timer                                                    WDT   < ON>
|- Virtualization                                                    SVM   [ ON]
   |- I/O MMU                                                      AMD-V   [ ON]
   |- Version                                                     [         0.1]
   |- Hypervisor                                                           [OFF]
   |- Vendor ID                                                   [         N/A]

Performance Monitoring                                                          
|- Version                                                        PM       [N/A]
|- Counters:          General                   Fixed                           
|                     6 x 64 bits             3 x 64 bits                       
|- Enhanced Halt State                                           C1E       
|- C2 UnDemotion                                                 C2U       
|- C3 UnDemotion                                                 C3U       < ON>
|- Core C6 State                                                 CC6       < ON>
|- Package C6 State                                              PC6       < ON>
|- Legacy Frequency ID control                                   FID       [OFF]
|- Legacy Voltage ID control                                     VID       [OFF]
|- P-State Hardware Coordination Feedback                MPERF/APERF       [ ON]
|- Hardware-Controlled Performance States                        HWP       [ ON]
|- Core C-States                                                                
   |- C-States Base Address                                      BAR   [ 0x413 ]
|- MONITOR/MWAIT                                                                
   |- State index:    #0    #1    #2    #3    #4    #5    #6    #7              
   |- Sub C-State:     1     1     0     0     0     0     0     0              
|- Core Cycles                                                         [Capable]
|- Instructions Retired                                                [Capable]
|- Reference Cycles                                                    [Capable]
|- Last Level Cache References                                         [Capable]
|- Performance Time Stamp Counter                                      [Missing]
|- Data Fabric Performance Counter                                     [Capable]
|- Core Performance Counter                                            [Capable]

Power, Current & Thermal                                                        
|- Junction Temperature                                        TjMax   [49: 90C]
|- Digital Thermal Sensor                                        DTS   [Capable]
|- Power Limit Notification                                      PLN   [Missing]
|- Package Thermal Management                                    PTM   [Missing]
|- Thermal Monitor 1                                             TTP   [ Enable]
|- Thermal Monitor 2                                             HTC   [ Enable]
|- Thermal Design Power                                          TDP   [  105 W]
   |- Minimum Power                                              Min   [  105 W]
   |- Maximum Power                                              Max   [  105 W]
|- Thermal Design Power                                      Package   < Enable>
   |- Power Limit (0 sec)                                        PL1   <  270 W>
   |- Power Limit (0 sec)                                        PL2   < 1200 W>
|- Thermal Design Power                                         Core   
   |- Power Limit                                                PL1   [Missing]
|- Thermal Design Power                                       Uncore   
   |- Power Limit                                                PL1   [Missing]
|- Thermal Design Power                                         DRAM   
   |- Power Limit                                                PL1   [Missing]
|- Thermal Design Power                                     Platform   
   |- Power Limit                                                PL1   [Missing]
   |- Power Limit                                                PL2   [Missing]
|- Package Power Tracking                                        PPT   [  142 W]
|- Electrical Design Current                                     EDC   [  140 A]
|- Thermal Design Current                                        TDC   <   95 A>
|- Units                                                                        
   |- Power                                               watt   [  0.125000000]
   |- Energy                                             joule   [  0.000015259]
   |- Window                                            second   [  0.000976562]

What I'm really interested in is ability to display and control power/thermal limits: TDC, EDC, PPT and max temperature.

CoreFreq currently displays stock PPT, EDC, and TDC values for 5950X, instead of actual live ones. TDC field is editable, but adjusting it doesn't work. At the same time actual PPT value is displayed in "Power Limit (0 sec) PL1", and adjusting it does work.

User adjustable max temperature ("Platform Thermal Throttle Limit" in AMD BIOS, allows to cap max temperature below TjMax) seems no be unavailable in CoreFreq.

Here are my actual values, set in UEFI, seen in Win10 tools: PPT 270W, TDC 160A, EDC 180A, Temp Limit 80C. amd

cyring commented 2 years ago

Hello,

What I have noticed in BIOS is that those limits have to be changed from [AUTO] to any value, lower than their max. Then I able to increase/decrease the limits, from CoreFreq

Can you give a try in that way ?

cyring commented 2 years ago

EDIT:

notgood commented 2 years ago

Went to BIOS, lowered TDC and EDC to 100A, PPT to 100W, temp to 80C.

Corefreq still incorrectly displays stock and non-adjustable values for 5950X (TDC 95A, EDC 140A, PPT 142W)

As I mentioned, real and correct PPT is displayed in Corefreq "Power Limit (0 sec) PL1" field. I can adjust in any steps as well (until I hit some other temperature or current limits) Here is a some screenshots of burn test: adjusting PL1 to 80W, 100W and 120W results same consumption in Package power label.

80W

100W

120W

cyring commented 2 years ago

2021-11-23-003616_644x550_scrot

cyring commented 2 years ago

2021-11-23-010537_644x550_scrot

2021-11-23-010537_644x550_scrot-0 2021-11-23-010537_644x550_scrot-2 2021-11-23-010537_644x550_scrot-3

notgood commented 2 years ago

OK, tried installing https://gitlab.com/leogx9r/ryzen_smu/ driver and https://github.com/hattedsquirrel/ryzen_monitor frontend.

ryzen_smu is able to correctly retrieve EDC/TDC/PPT/TempLimit, both current and limits. It also can adjust all four variables.

So perhaps required registers can be found in ryzen_smu code.

ryzen_smu

cyring commented 2 years ago

OK, tried installing https://gitlab.com/leogx9r/ryzen_smu/ driver and https://github.com/hattedsquirrel/ryzen_monitor frontend.

ryzen_smu is able to correctly retrieve EDC/TDC/PPT/TempLimit, both current and limits. It also can adjust all four variables.

So perhaps required registers can be found in ryzen_smu code.

ryzen_smu

Those are undocumented, unspecified registers from Manufacturer.

I will rather implement the "safe" i2c/RSMI, RBI protocols.

notgood commented 2 years ago

From my limited testing, both corefreqk and ryzen_smu kernel drivers can coexist. Frontends also seem to run OK at the same time.

I was able to run the following things simultaneously:

Fits my needs I guess.

cyring commented 2 years ago

From my limited testing, both corefreqk and ryzen_smu kernel drivers can coexist. Frontends also seem to run OK at the same time.

I was able to run the following things simultaneously:

  • burn test in corefreqd
  • watch all the related data in corefreq-cli and ryzen_monitor
  • adjust max allowed temperature using /sys/kernel/ryzen_smu_drv/rsmu_cmd

Fits my needs I guess.

Hello,

Can you give a try to the develop branch for those cases:

Feel free to add screenshots; thank you.

notgood commented 2 years ago

Installed git AUR packages: corefreq-client-git 1.88.4.r19.g39eeddd-1 corefreq-dkms-git 1.88.4.r19.g39eeddd-1 corefreq-server-git 1.88.4.r19.g39eeddd-1

Memory controller corefreq vs zentimings: memory zentimings

notgood commented 2 years ago

Corefreq sensors

sensors voltage

power

notgood commented 2 years ago

Limits corefreq vs AMDMaster, TDC/EDC/PPT still stock instead of actual ones. limits

AMDmaster

cyring commented 2 years ago

@notgood Thank you for your screenshots. Apparently looks like based on develop branch : do you confirm ?

diocletiann commented 2 years ago

TDC/EDC/PPT, they are indeed still stock. I can't beat Ryzen-Master because only AMD knows those registers. Perhaps one day.

CTR/Hydra can do it on Windows, FYI.

notgood commented 2 years ago

AMD Ryzen TDC/EDC/PPT can be reliably read and changed using open source software on both Linux (ryzen_smu) and Windows (SMUDebugTool), I've tested it myself. SMU commands, addresses and arguments required to to so are all available in relevant gitlab/github repositories. To my understanding, this information was acquired by snooping on commands sent to CPU by Ryzen Master.

But I also understand and respect @cyring wish to only use officialy available and documented commands.

cyring commented 2 years ago

AMD Ryzen TDC/EDC/PPT can be reliably read and changed using open source software on both Linux (ryzen_smu) and Windows (SMUDebugTool), I've tested it myself. SMU commands, addresses and arguments required to to so are all available in relevant gitlab/github repositories. To my understanding, this information was acquired by snooping on commands sent to CPU by Ryzen Master.

But I also understand and respect @cyring wish to only use officialy available and documented commands.

Issue postponed until I get better TDC , EDC registers specs. PPT of 270W appears ok with RM