cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.93k stars 127 forks source link

CCD/CCX topology, cpuidle and cpufreq governor not working on AMD 3970X #181

Closed justanerd closed 3 years ago

justanerd commented 4 years ago

The insmod should be enough if acpi_cpufreq is unloaded or am I missing something?

insmod corefreqk.ko Register_CPU_Freq=1 Register_CPU_Idle=1 Experimental=1 Register_Governor=1

dmesg:

[13864.517170] calling  CoreFreqK_init+0x0/0x1000 [corefreqk] @ 927568
[13864.780352] CoreFreq(0:32): Processor [ 8F_31] Architecture [Zen2/Castle Peak] SMT [64/64]
[13864.780950] initcall CoreFreqK_init+0x0/0x1000 [corefreqk] returned 0 after 257588 usecs

cmdline relavant part: idle=halt

kernel config: https://pastebin.com/ja0d9518

cyring commented 4 years ago

Hello,

justanerd commented 4 years ago

I tried cpufreq.off=1 corefreq-cli shows all drivers off Any way to verify if the P-State control is working?

cyring commented 4 years ago

I tried cpufreq.off=1 corefreq-cli shows all drivers off Any way to verify if the P-State control is working?

Yes, if you can increase the Max or any Boosted 1C , 2C ; it means that my driver is in control of the P-States. The Target TGT is also related to P-States, as a limiter. 2020-04-24-174943_644x452_scrot

Please, gently raise the frequently ratio with one bin, as a starter

justanerd commented 4 years ago

I tried to change all the PSTATE settings I can choose the ratios but they will not get applied. image

cyring commented 4 years ago

Was this on the master branch ? May-be try the current develop

cyring commented 4 years ago

I forgot to say : Core Performance Boost has to be disabled prior setting new ratios I saw BIOS screenshots calling it Manual OC

justanerd commented 4 years ago

That worked. Sadly I can't increase past 3.7Ghz.

cyring commented 4 years ago

That worked. Sadly I can't increase past 3.7Ghz.

Did you get a stable frequency ? For the whole Processor or for distinct Core(s) ? What's the max frequency ratio you can set in BIOS or other tools ?

justanerd commented 4 years ago

That worked. Sadly I can't increase past 3.7Ghz.

Did you get a stable frequency ? Yes For the whole Processor or for distinct Core(s) ? whole What's the max frequency ratio you can set in BIOS or other tools ? With https://github.com/r4m0n/ZenStates-Linux I can enable the OC mode for all core to whatever I want. But the P-State stuff is also not working with higher frequencies.

cyring commented 4 years ago

3970X is advertized with a frequency boost up to 4.5 GHz. How high your BIOS let you set in manual OC ? Do you them read the same frequency ratio between BIOS and CoreFreq and Kernel ?

justanerd commented 4 years ago

I haven't tried the Custom P-State option yet but with a static divider I can set what ever I want. I mostly run the system with PBO because all core overclocks are not good for single core performance.

On Sat, May 16, 2020 at 6:52 PM CYRIL INGENIERIE notifications@github.com wrote:

3970X is advertized with a frequency boost up to 4.5 GHz. How high your BIOS let you set in manual OC ? Do you them read the same frequency ratio between BIOS and Kernel ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cyring/CoreFreq/issues/181#issuecomment-629675000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7RJKB2GRG75G6DK7PT33RR3AFZANCNFSM4MP656SQ .

cyring commented 4 years ago

Hello,

Could you try the develop branch and post back the Topology in text mode: corefreq-cli -s -m

Thank you

justanerd commented 4 years ago

sure output.txt

cyring commented 4 years ago

I'm not sure about the caches size: with 64 x SMT Cores, I'm decoding for a double value, compared to AMD datasheet. Can you tell which sum is written for L1 L2 L3 in the UI header ?

justanerd commented 4 years ago

image

cyring commented 4 years ago

Thanks. AMD says:

Total cache L1 = 2MB
Total cache L2 = 16MB
Total cache L3 = 128MB

Considering 32 physical Cores, L1 and L2 unit size in KB is ok, but L3 is wrong. Still searching but I'm using the same algorithm for Ryzen and Threadripper

cyring commented 4 years ago

Hello, I'm rolling back to the AMD L3 cache size formula where register is given in 512 KB units. Can you pull and try the latest develop branch. Thank you.

cyring commented 4 years ago

Hello, I'm trying to read the thermal sensor per node which to my understanding counts more than one with EPYC & Threadripper Could you please try the develop version. Once UI is running go to Settings > Thermal scope and change it to Thread, such as below. CoreFreq_ThermalNode

justanerd commented 4 years ago

image

cyring commented 4 years ago

Thanks for trying Regards

cyring commented 4 years ago

Hello, Among the last changes in issue #195 , an I/O ASM access to the FCH have been added: can you please check if the last CoreFreq version is starting fine ? Thank you

justanerd commented 4 years ago

Yes, runs fine.

cyring commented 4 years ago

Yes, runs fine.

Thank you very much

cyring commented 3 years ago

Hello,

For your testings, the cpufreq is available in the develop branch

You will have to blacklist any current cpufreq driver, like acpi-cpufreq, to leave the room to the CoreFreq driver

CoreFreq's cpufreq can be registered from its kernel module using these two options:

insmod corefreqk.ko Register_CPU_Freq=1 Register_Governor=1

or from the Client: you first register the governor then CPU-Freq

Remark: CPU-Idle is not implemented yet.

Once CoreFreq's cpufreq well registered, you can play with:

freq=3500000; for cpu in /sys/devices/system/cpu/cpufreq/policy*/scaling_setspeed ; do echo $freq > $cpu; done

### where freq is one of your available frequencies (ratio shown in cyan color)

echo 0 > /sys/devices/system/cpu/cpufreq/boost 
echo 1 > /sys/devices/system/cpu/cpufreq/boost

### and many other file attributes in the tree /sys/devices/system/cpu/cpufreq

Any change made on command line should be reflected in the UI

Cyril

cyring commented 3 years ago

Hello,

In the develop branch is available a fix to monitor the temperature per CCD. Could you please try with the Threadripper ?

justanerd commented 3 years ago

image It shows temps but they are completely in sync.

cyring commented 3 years ago

It shows temps but they are completely in sync.

Thanks for testing. Can you post the full output of the topology using corefreq-cli -m

I have to check the CCD resulting of Threadripper

Be aware you have to stress differently CPUs to observe temps

You can now switch to the master branch

justanerd commented 3 years ago

https://pastebin.ubuntu.com/p/3Z6yNrSd5R/

cyring commented 3 years ago
cyring commented 3 years ago

Hello,

Thank you

justanerd commented 3 years ago

looks better: image

justanerd commented 3 years ago

zenpower output for reference: image

cyring commented 3 years ago

Thanks for your return.

justanerd commented 3 years ago

https://pastebin.ubuntu.com/p/2Sct94kNyr/

cyring commented 3 years ago

Warning: if you are using another monitoring tool while CoreFreq is running, then my driver has to be built in a special way to share a kernel mutex among all tools going through the SMU (presuming they are doing the same)

The build for a kernel protected SMU access is:

make HWM_CHIPSET=COMPATIBLE clean all

Remark HWM_CHIPSET=COMPATIBLE has not been intensively tested. So If you rather want to stay with the CoreFreq driver standard way:

make clean all

then you have to make to sure no other software and their respective driver are not accessing the SMU: zenpower being one of them.

cyring commented 3 years ago

New change again in the develop branch to count 4 Cores per CCD Previous results don't look like any screenshots of Threadrippers I can see on the Internet Sorry for this new change and thank you for providing the topology output.

justanerd commented 3 years ago

image First and third temps are always the same. topology: https://pastebin.ubuntu.com/p/S3sCNfghK9/

cyring commented 3 years ago

First and third temps are always the same.

Thank you for your returns

Based on the PPR specs above, it appears that one every two CCD is wired; may be because of the Threadripper design.

Based on the current develop branch, could you replace this source code: https://github.com/cyring/CoreFreq/blob/dba5cc60d7f1fda33ef027e65d3072b6e6aa1698/corefreqk.c#L11023 with this:

        Core_AMD_SMN_Read(  TccdSensor,
                    SMU_AMD_THM_TCTL_CCD_REGISTER_F17H
                    + ((2 * Core->T.Cluster.CCD) << 2),
                    SMU_AMD_INDEX_REGISTER_F17H,
                    SMU_AMD_DATA_REGISTER_F17H );

Please build, reload driver all and test temperatures.

Fyi, this change is interlacing the CCD when querying the SMU for sensors; a series of computed IDs, as follows 0 2 4 6

justanerd commented 3 years ago

looks better now: image

justanerd commented 3 years ago

first and last CCX still have the same temperature

cyring commented 3 years ago

first and last CCX still have the same temperature

up to eight core/cache complex dies (CCD) and a single I/O die (IOD).

SP3 consists of from two to eight CCDs plus one IOD

The two CCXs of a CCD share a single GMI2 Fabric port to the IOD.

A single CCX consists of ... Four cores ... single-thread mode (1T) or two-thread SMT mode (2T)

I'm trying to find in specs for a discriminant register to refine the topology among the 8 x CCD registers.

We could also count the CCD ID based on the architecture code-name but we have to consider the 3990X case and other EPYC 64 Cores where all the 8 CCD should be wired; and where the above code trick won't make it.

cyring commented 3 years ago

Can you please post the CPUID dump

corefreq-cli -u

I'm especially interested by the latest physical Core of your Threadripper which would be CPU #31

justanerd commented 3 years ago

https://pastebin.ubuntu.com/p/rJxxgzrwQS/

cyring commented 3 years ago

Can you please output this ?

lspci -n
justanerd commented 3 years ago

00:00.0 0600: 1022:1480 00:00.2 0806: 1022:1481 00:01.0 0600: 1022:1482 00:02.0 0600: 1022:1482 00:03.0 0600: 1022:1482 00:04.0 0600: 1022:1482 00:05.0 0600: 1022:1482 00:07.0 0600: 1022:1482 00:07.1 0604: 1022:1484 00:08.0 0600: 1022:1482 00:08.1 0604: 1022:1484 00:14.0 0c05: 1022:790b (rev 61) 00:14.3 0601: 1022:790e (rev 51) 00:18.0 0600: 1022:1490 00:18.1 0600: 1022:1491 00:18.2 0600: 1022:1492 00:18.3 0600: 1022:1493 00:18.4 0600: 1022:1494 00:18.5 0600: 1022:1495 00:18.6 0600: 1022:1496 00:18.7 0600: 1022:1497 01:00.0 1300: 1022:148a 02:00.0 1300: 1022:1485 02:00.3 0c03: 1022:148c 20:00.0 0600: 1022:1480 20:00.2 0806: 1022:1481 20:01.0 0600: 1022:1482 20:02.0 0600: 1022:1482 20:03.0 0600: 1022:1482 20:03.1 0604: 1022:1483 20:04.0 0600: 1022:1482 20:05.0 0600: 1022:1482 20:07.0 0600: 1022:1482 20:07.1 0604: 1022:1484 20:08.0 0600: 1022:1482 20:08.1 0604: 1022:1484 21:00.0 0300: 10de:1b06 (rev a1) 21:00.1 0403: 10de:10ef (rev a1) 22:00.0 1300: 1022:148a 23:00.0 1300: 1022:1485 23:00.1 1080: 1022:1486 23:00.3 0c03: 1022:148c 40:00.0 0600: 1022:1480 40:00.2 0806: 1022:1481 40:01.0 0600: 1022:1482 40:01.1 0604: 1022:1483 40:02.0 0600: 1022:1482 40:03.0 0600: 1022:1482 40:03.1 0604: 1022:1483 40:03.2 0604: 1022:1483 40:03.3 0604: 1022:1483 40:03.4 0604: 1022:1483 40:04.0 0600: 1022:1482 40:05.0 0600: 1022:1482 40:07.0 0600: 1022:1482 40:07.1 0604: 1022:1484 40:08.0 0600: 1022:1482 40:08.1 0604: 1022:1484 41:00.0 0604: 1022:57ad 42:01.0 0604: 1022:57a3 42:02.0 0604: 1022:57a3 42:03.0 0604: 1022:57a3 42:04.0 0604: 1022:57a3 42:05.0 0604: 1022:57a3 42:08.0 0604: 1022:57a4 42:09.0 0604: 1022:57a4 42:0a.0 0604: 1022:57a4 43:00.0 0108: 144d:a808 44:00.0 0200: 8086:1563 (rev 01) 44:00.1 0200: 8086:1563 (rev 01) 46:00.0 0c03: 1b21:2142 47:00.0 0106: 1b21:0612 (rev 02) 48:00.0 0280: 8086:2723 (rev 1a) 49:00.0 1300: 1022:1485 49:00.1 0c03: 1022:149c 49:00.3 0c03: 1022:149c 4a:00.0 0106: 1022:7901 (rev 51) 4b:00.0 0106: 1022:7901 (rev 51) 4c:00.0 0108: 144d:a808 4d:00.0 0108: 1987:5012 (rev 01) 4e:00.0 0108: 144d:a808 4f:00.0 0108: 144d:a808 50:00.0 1300: 1022:148a 51:00.0 1300: 1022:1485 60:00.0 0600: 1022:1480 60:00.2 0806: 1022:1481 60:01.0 0600: 1022:1482 60:02.0 0600: 1022:1482 60:03.0 0600: 1022:1482 60:04.0 0600: 1022:1482 60:05.0 0600: 1022:1482 60:07.0 0600: 1022:1482 60:07.1 0604: 1022:1484 60:08.0 0600: 1022:1482 60:08.1 0604: 1022:1484 61:00.0 1300: 1022:148a 62:00.0 1300: 1022:1485

cyring commented 3 years ago

Thanks.


To solve a CCD factor:

Example Avail. Cores SMT ECX factor
3990X 64 128 7f ? 1
3970X 32 64 3f 2
3960X 24 48 2f 2
3950X 16 32 1f 1
3900X 12 24 17 1

EDIT: dump found at instlatx64

        Core_AMD_SMN_Read(  TccdSensor,
                    SMU_AMD_THM_TCTL_CCD_REGISTER_F17H
                    + (( CCD_factor * Core->T.Cluster.CCD) << 2),
                    SMU_AMD_INDEX_REGISTER_F17H,
                    SMU_AMD_DATA_REGISTER_F17H );
cyring commented 3 years ago

I would also like to count the SMU registers by counting the number of UMC

  1. can you dump the following address
    modprobe msr
    rdmsr -aX 0x0000017b
  2. the output of
    corefreq-cli -M
justanerd commented 3 years ago

rdmsr:

FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
FFFFFFFFFFFFFFEF
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F
6F

corefreq-cli -M

                              Zen UMC  [1493]                              
Controller #0                                                Quad Channel  
 Bus Rate     0 MT/s      Bus Speed    0 MHz           DRAM Speed    0 MHz 

 Cha   CL  RCDR RCDW  RP  RAS   RC  RRDS RRDL FAW  WTRS WTRL  WR  clRR clWW
  #0   18   18   18   18   39   57    4    6   26    3    9   18    3    3 
  #1   18   18   18   18   39   57    4    6   26    3    9   18    3    3 
  #2   16   15   14   14   32   46    4    6   20    4   12   12    4    4 
  #3   16   15   14   14   32   46    4    6   20    4   12   12    4    4 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   12    9    5    1    1    3    3    1    3    3    0    0    0    0 
  #1   12    9    5    1    1    3    3    1    3    3    0    0    0    0 
  #2   16    8    8    4    1    7    7    1    5    5    0    0    0    0 
  #3   16    8    8    4    1    7    7    1    5    5    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB sFAW dFAW Ban  Page  CKE  CMD  GDM  ECC
  #0  9360  312  192  132   0    0    0    0  R0W0   0    6   1T   OFF   0 
  #1  9360  312  192  132   0    0    0    0  R0W0   0    6   1T   OFF   0 
  #2 14553  298  192  132   0    0    0    0  R1W1   0    1   1T    ON   0 
  #3 14553  298  192  132   0    0    0    0  R1W1   0    1   1T    ON   0 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1                                                                  
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1                                                                  
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1     2   16     65536      1024          16384                    
 DIMM Geometry for channel #3                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1     2   16     65536      1024          16384                    
justanerd commented 3 years ago

There should be 2 more memory modules I have actually 4 x 16GB

cyring commented 3 years ago

There should be 2 more memory modules I have actually 4 x 16GB

Very interesting