cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.98k stars 126 forks source link

Whitehaven Memory Controller #450

Closed svmlegacy closed 1 year ago

svmlegacy commented 1 year ago

image

2 of 4 memory channels shown (all 4 populated in this case)

svmlegacy commented 1 year ago

Corefreq output: Here.

cyring commented 1 year ago

Thanks

cyring commented 1 year ago

In function AMD_DataFabric_Zeppelin() can you please replace the umc_max from 1 to 2: https://github.com/cyring/CoreFreq/blob/a0eeedae23df8d4e4326bbc06a6b6f396ed9449e/corefreqk.c#L6592

Edit: I have fixed it since the first answer.

static PCI_CALLBACK AMD_DataFabric_Zeppelin(struct pci_dev *pdev)
{
    if (strncmp(PUBLIC(RO(Proc))->Architecture,
        Arch[PUBLIC(RO(Proc))->ArchID].Architecture[CN_WHITEHAVEN],
        CODENAME_LEN) == 0)
    {
    return AMD_17h_DataFabric(  pdev,
                    (const unsigned int[2][2]) {
                        { 0x0, 0x20},
                        {0x10, 0x28}
                    },
                    0x30, 0x80,
                    2, MC_MAX_CHA,
        (const unsigned int[]) {PCI_DEVFN(0x18, 0x0),
                    PCI_DEVFN(0x19, 0x0)} );
    }
    else
    {
    return AMD_17h_DataFabric(  pdev,
                    (const unsigned int[2][2]) {
                        { 0x0, 0x20},
                        {0x10, 0x28}
                    },
                    0x30, 0x80,
                    1, MC_MAX_CHA,
        (const unsigned int[]) {PCI_DEVFN(0x18, 0x0)} );
    }
}

Rebuild, try the Memory Controller and post its output.

Also track your kernel log for any message as bellow:

CoreFreq: AMD_17h_DataFabric()
 Break UMC(%hu) probing @ PCI(0x%x:0x0:0x%x)
cyring commented 1 year ago

Using the code code change above, can you also show me the Memory Controller output of your Ryzen 7 1700X ?

svmlegacy commented 1 year ago

This memory is currently running at 2133 MHz, so the measurement is valid.

Modified code is producing expected results:

$ ./corefreq-cli -M
                              Zen UMC  [1460]                              
Controller #0                                                Dual Channel  
 Bus Rate  1066 MHz       Bus Speed 1066 MHz           DDR4 Speed 2133 MT/s

 Cha   CL  RCDr RCDw  RP  RAS   RC  RRDs RRDl FAW  WTRs WTRl  WR  clRR clWW
  #0   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
  #1   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   11    8    9    0    1    5    5    1    3    3    0    0    0    0 
  #1   11    8   10    0    1    5    5    1    3    3    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB  BGS:Alt  Ban  Page  CKE  CMD  GDM  ECC
  #0  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
  #1  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
      MRD:PDA   MOD:PDA  WRMPR STAG PDM RDDATA WRD  WRL  RDL  XS   XP CPDED
  #0    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
  #1    8  16    24  24    24    6 0:P:0   10   2    6   22  384    7    4 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16

Controller #1                                                Dual Channel  
 Bus Rate  1066 MHz       Bus Speed 1066 MHz           DDR4 Speed 2133 MT/s

 Cha   CL  RCDr RCDw  RP  RAS   RC  RRDs RRDl FAW  WTRs WTRl  WR  clRR clWW
  #0   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
  #1   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   11    8    9    0    1    5    5    1    3    3    0    0    0    0 
  #1   11    8   10    0    1    5    5    1    3    3    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB  BGS:Alt  Ban  Page  CKE  CMD  GDM  ECC
  #0  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
  #1  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
      MRD:PDA   MOD:PDA  WRMPR STAG PDM RDDATA WRD  WRL  RDL  XS   XP CPDED
  #0    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
  #1    8  16    24  24    24    6 0:P:0   10   2    6   22  384    7    4 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16

I'll get the 1700X's memory controller up in just a few minutes.

Not sure what's going on with the C-states. Motherboard does not have good options for them. (Or is the errata you mention the explanation for it?)

svmlegacy commented 1 year ago

AMD Ryzen 7 1700X, same code:

$ ./corefreq-cli -M
                              Zen UMC  [1460]                              
Controller #0                                                Dual Channel  
 Bus Rate  1066 MHz       Bus Speed 1064 MHz           DDR4 Speed 2129 MT/s

 Cha   CL  RCDr RCDw  RP  RAS   RC  RRDs RRDl FAW  WTRs WTRl  WR  clRR clWW
  #0   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
  #1   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   11    8    9    0    1    6    6    1    4    4    0    0    0    0 
  #1   11    8    9    0    1    6    6    1    4    4    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB  BGS:Alt  Ban  Page  CKE  CMD  GDM  ECC
  #0  8316  374  278  171   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
  #1  8316  374  278  171   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
      MRD:PDA   MOD:PDA  WRMPR STAG PDM RDDATA WRD  WRL  RDL  XS   XP CPDED
  #0    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
  #1    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0    16    1     65536      1024           8192  CMT32GX4M4C3200C16
       #1                                                                  
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0    16    1     65536      1024           8192  CMT32GX4M4C3200C16
       #1                                                                  
cyring commented 1 year ago

Not sure what's going on with the C-states. Motherboard does not have good options for them. (Or is the errata you mention the explanation for it?)

About the missing Sub C-State values, I have no idea if it is due to an errata but I presume there is. First series of Ryzen had some issues with C-states. I remember reading it was better to sleep with the HALT instruction rather than MWAIT to prevent a freeze. My guess is that CPUID is returning zero Sub C-State has a hint for the kernel idle function.

You can however register CoreFreq as the kernel CPU Idle handler; next you will invoke an idle method of your choice in the Settings menu. See wiki/CoreFreq as the Clock Source, CPU Freq and CPU Idle driver Keep an eye on voltage and power consumed to decide which method is appropriated and stable.

About the original Memory Controller, I will provide soon that code fix, including the EPYC and Zen+ TR multi UMC cases too. I just need volunteers to do the non regression tests on EPYC and other Threadripper Processors.

cyring commented 1 year ago

Memory Controller fix is committed in 706460f852f10159eb1492df62cb8c060c74ecbc

I need a Naples test: @munorc could you please run the latest commit with your EPYC and post here the Memory Controller output ?

cyring commented 1 year ago
corefreq-cli -m

CPU Pkg  Apic  Core/Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID CCD CCX ID/ID L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
000:BSP    0   0  0   0  0      64  4        32  8       512  8 i   16384 32w 
001:  0    2   0  0   1  0      64  4        32  8       512  8 i   16384 32w 
002:  0    4   0  0   2  0      64  4        32  8       512  8 i   16384 32w 
003:  0    6   0  0   3  0      64  4        32  8       512  8 i   16384 32w 
004:  1   16   1  2   8  0      64  4        32  8       512  8 i   16384 32w 
005:  1   18   1  2   9  0      64  4        32  8       512  8 i   16384 32w 
006:  1   20   1  2  10  0      64  4        32  8       512  8 i   16384 32w 
007:  1   22   1  2  11  0      64  4        32  8       512  8 i   16384 32w 
008:  0    1   0  0   0  1      64  4        32  8       512  8 i   16384 32w 
009:  0    3   0  0   1  1      64  4        32  8       512  8 i   16384 32w 
010:  0    5   0  0   2  1      64  4        32  8       512  8 i   16384 32w 
011:  0    7   0  0   3  1      64  4        32  8       512  8 i   16384 32w 
012:  1   17   1  2   8  1      64  4        32  8       512  8 i   16384 32w 
013:  1   19   1  2   9  1      64  4        32  8       512  8 i   16384 32w 
014:  1   21   1  2  10  1      64  4        32  8       512  8 i   16384 32w 
015:  1   23   1  2  11  1      64  4        32  8       512  8 i   16384 32w 

About CCX falling in a set of {0, 2}, I'm referring to the "AMD diagonal configuration" mentioned in this TechPowerUp's article. It would that mean no CCX number 1 or 3.

cyring commented 1 year ago

I have received results from EPYC: no regression encountered.

https://github.com/cyring/CoreFreq/issues/388#issuecomment-1579679040

https://github.com/cyring/CoreFreq/issues/388#issuecomment-1579694814

Genoa EPYC is still unknown to me ; just got Raphael results.

Feel free to close the issue.

Regards Cyril

svmlegacy commented 1 year ago

I've just updated my gist with the latest commit. Working great, thanks for the efforts everyone!