cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
2k stars 126 forks source link

[AMD][Zen] SMU > Data Fabric > UMC #196

Closed cyring closed 3 years ago

cyring commented 4 years ago

Development notes

2020-07-15

[  340.567031] CoreFreq(12:28): Processor [ 8F_71] Architecture [Zen2/Matisse] SMT [32/32]
[  340.572209] Welcome to the Data Fabric UMC(0) @ 0x00050000:
               0x030[0x00150508] 0x080[0x00000000] 0x100[0x80000200]
               0x104[0xb040808b] 0x14c[0x00000000]
               0xdf0[0x00010030] 0xdf4[0x00000000]
[  340.572217] Welcome to the Data Fabric UMC(1) @ 0x00150000:
               0x030[0x00150508] 0x080[0x00000000] 0x100[0x80000200]
               0x104[0xb040808b] 0x14c[0x00000000]
               0xdf0[0x00010030] 0xdf4[0x00000000]
[  340.572218] CHA[0]   CHIP_BAR[0][0]=0x00050000 CHIP_BAR[0][1]=0x00050020
                        CHIP_BAR[1][0]=0x00050010 CHIP_BAR[1][1]=0x00050028
[  340.572219] CHA[0] CHIP[0:0] @ 0x00050000[0x00000000] Disable
[  340.572221] CHA[0] MASK[0:0] @ 0x00050020[0x00000000]
[  340.572222] CHA[0] CHIP[0:1] @ 0x00050010[0x00000000] Disable
[  340.572223] CHA[0] MASK[0:1] @ 0x00050028[0x00000000]
[  340.572225] CHA[0] CHIP[1:0] @ 0x00050004[0x00000000] Disable
[  340.572226] CHA[0] MASK[1:0] @ 0x00050020[0x00000000]
[  340.572227] CHA[0] CHIP[1:1] @ 0x00050014[0x00000000] Disable
[  340.572229] CHA[0] MASK[1:1] @ 0x00050028[0x00000000]
[  340.572230] CHA[0] CHIP[2:0] @ 0x00050008[0x00000001] Enable
[  340.572232] CHA[0] MASK[2:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
[  340.572233] CHA[0] CHIP[2:1] @ 0x00050018[0x00000000] Disable
[  340.572235] CHA[0] MASK[2:1] @ 0x0005002c[0x00000000]
[  340.572236] CHA[0] CHIP[3:0] @ 0x0005000c[0x00000201] Enable
[  340.572238] CHA[0] MASK[3:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
[  340.572239] CHA[0] CHIP[3:1] @ 0x0005001c[0x00000000] Disable
[  340.572240] CHA[0] MASK[3:1] @ 0x0005002c[0x00000000]
[  340.572241] Memory Size[16777216 KB] [16384 MB]
[  340.572242] CHA[1]   CHIP_BAR[0][0]=0x00150000 CHIP_BAR[0][1]=0x00150020
                        CHIP_BAR[1][0]=0x00150010 CHIP_BAR[1][1]=0x00150028
[  340.572243] CHA[1] CHIP[0:0] @ 0x00150000[0x00000000] Disable
[  340.572244] CHA[1] MASK[0:0] @ 0x00150020[0x00000000]
[  340.572246] CHA[1] CHIP[0:1] @ 0x00150010[0x00000000] Disable
[  340.572247] CHA[1] MASK[0:1] @ 0x00150028[0x00000000]
[  340.572248] CHA[1] CHIP[1:0] @ 0x00150004[0x00000000] Disable
[  340.572250] CHA[1] MASK[1:0] @ 0x00150020[0x00000000]
[  340.572251] CHA[1] CHIP[1:1] @ 0x00150014[0x00000000] Disable
[  340.572253] CHA[1] MASK[1:1] @ 0x00150028[0x00000000]
[  340.572254] CHA[1] CHIP[2:0] @ 0x00150008[0x00000001] Enable
[  340.572255] CHA[1] MASK[2:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
[  340.572257] CHA[1] CHIP[2:1] @ 0x00150018[0x00000000] Disable
[  340.572258] CHA[1] MASK[2:1] @ 0x0015002c[0x00000000]
[  340.572260] CHA[1] CHIP[3:0] @ 0x0015000c[0x00000201] Enable
[  340.572261] CHA[1] MASK[3:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
[  340.572262] CHA[1] CHIP[3:1] @ 0x0015001c[0x00000000] Disable
[  340.572264] CHA[1] MASK[3:1] @ 0x0015002c[0x00000000]
[  340.572264] Memory Size[16777216 KB] [16384 MB]

2020-07-14

[11986.233958] CoreFreq(9:-1): Processor [ 8F_71] Architecture [Zen2/Matisse] CPU [16/16]
[11986.235034] Welcome to the Data Fabric UMC(0) @ 50000:
               0x30[0x150508] 0x80[0x0] 0x100[0x80000200]
               0x104[0xb040808b] 0x14c[0x0]
               0xdf0[0x10030] 0xdf4[0x0]
[11986.235042] Welcome to the Data Fabric UMC(1) @ 150000:
               0x30[0x150508] 0x80[0x0] 0x100[0x80000200]
               0x104[0xb040808b] 0x14c[0x0]
               0xdf0[0x10030] 0xdf4[0x0]
[11986.235042] 0xe8@d18f3[0x0]
[11986.235043] CHA[0]   CHIP_BAR[0][0]=0x50000 CHIP_BAR[0][1]=0x50020
                        CHIP_BAR[1][0]=0x50010 CHIP_BAR[1][1]=0x50028
[11986.235044] CHA[0] CHIP[0:0] @ 0x50000[0x0]
[11986.235046] CHA[0] CHIP[0:1] @ 0x50010[0x0]
[11986.235047] CHA[0] CHIP[1:0] @ 0x50004[0x0]
[11986.235048] CHA[0] CHIP[1:1] @ 0x50014[0x0]
[11986.235050] CHA[0] CHIP[2:0] @ 0x50008[0x1]
[11986.235051] CHA[0] CHIP[2:1] @ 0x50018[0x0]
[11986.235052] CHA[0] CHIP[3:0] @ 0x5000c[0x201]
[11986.235054] CHA[0] CHIP[3:1] @ 0x5001c[0x0]
[11986.235055] CHA[0] MASK[0:0] @ 0x50020[0x0]
[11986.235056] CHA[0] MASK[0:1] @ 0x50028[0x0]
[11986.235058] CHA[0] MASK[1:0] @ 0x50024[0x3fffdfe]
[11986.235059] CHA[0] MASK[1:1] @ 0x5002c[0x0]
[11986.235060] CHA[1]   CHIP_BAR[0][0]=0x150000 CHIP_BAR[0][1]=0x150020
                        CHIP_BAR[1][0]=0x150010 CHIP_BAR[1][1]=0x150028
[11986.235061] CHA[1] CHIP[0:0] @ 0x150000[0x0]
[11986.235063] CHA[1] CHIP[0:1] @ 0x150010[0x0]
[11986.235064] CHA[1] CHIP[1:0] @ 0x150004[0x0]
[11986.235065] CHA[1] CHIP[1:1] @ 0x150014[0x0]
[11986.235067] CHA[1] CHIP[2:0] @ 0x150008[0x1]
[11986.235068] CHA[1] CHIP[2:1] @ 0x150018[0x0]
[11986.235070] CHA[1] CHIP[3:0] @ 0x15000c[0x201]
[11986.235071] CHA[1] CHIP[3:1] @ 0x15001c[0x0]
[11986.235072] CHA[1] MASK[0:0] @ 0x150020[0x0]
[11986.235074] CHA[1] MASK[0:1] @ 0x150028[0x0]
[11986.235075] CHA[1] MASK[1:0] @ 0x150024[0x3fffdfe]
[11986.235076] CHA[1] MASK[1:1] @ 0x15002c[0x0]

2020-07-13

[17611.473676] Welcome to the Data Fabric UMC(0):
               0x80[0x0] 0x100[0x80000200] 0x104[0xb040808b]
               0xdf0[0x10030] 0xdf4[0x0]
[17611.473682] Welcome to the Data Fabric UMC(1):
               0x80[0x0] 0x100[0x80000200] 0x104[0xb040808b]
               0xdf0[0x10030] 0xdf4[0x0]

UMC Config

0x80000200 = 0b10000000000000000000001000000000

thus bit 9 and 31 enabled

SDP

0xb040808b = 0b10110000010000001000000010001011

bit 31 (SdpInit) in both UMC, we have two channels

cyring commented 4 years ago

CoreFreq_SR_Comment

CoreFreq_ISA_Comment

CoreFreq_DDR_Comments

adatum commented 4 years ago

Great usability improvement. Love it.

One comment is that the cell highlight is not "breaking" at the correct positions when the cell has an underscore:

comments

cyring commented 4 years ago

Great usability improvement. Love it.

One comment is that the cell highlight is not "breaking" at the correct positions when the cell has an underscore:

Indeed this is the issue I have with 5 characters per cell, minus one separator space. Unfortunately some DDR terms can not fit 4 characters, especially the advanced timings.
I'm expecting to complete all possible Timings before expanding cells to the largest common width.

cyring commented 4 years ago

Current view of the IMC

BCLK

adatum commented 4 years ago

Unfortunately in BIOS 2203 I found only VRM spread spectrum, nothing for CPU or SB. I don't notice a difference from disabling it:

vrm_spread_sprectrum_off

It's a good find. We had discussed the BCLK not being exactly 100 MHz long ago. As insignificant as it is, it has always bugged me! On forums, many people complain that Asus has not exposed spread spectrum options in the BIOS. It may vary depending on the BIOS version.

Ropid commented 4 years ago

@adatum What happens if you manually set BCLK to "100.0" instead of using the default "Auto"? On my ASRock board, it seemed to me like the spread spectrum setting is tied to the Auto value for BCLK.

cyring commented 4 years ago

Unfortunately in BIOS 2203 I found only VRM spread spectrum, nothing for CPU or SB. I don't notice a difference from disabling it:

It's a good find. We had discussed the BCLK not being exactly 100 MHz long ago. As insignificant as it is, it has always bugged me! On forums, many people complain that Asus has not exposed spread spectrum options in the BIOS. It may vary depending on the BIOS version.

adatum commented 4 years ago

@Ropid Good tip, but it didn't make a difference. Confusingly, the search function in BIOS finds two entries for BCLK. One was set to "100.0000" and the other was on Auto. I changed from Auto to "100", but I see no change. There is also a BCLK_divider option set to Auto, which I left alone.

@cyring I don't really want to change BCLK for stability reasons, even though small changes 100.1 MHz would probably be fine. Worst consequence is living with 99.8 MHz. It seems to be a recurring complaint about the BIOS option though. Some modified BIOSes from overclocking forums I believe expose spread spectrum settings.

EDIT: There are reports that using DOCP results in BCLK of 99.8 MHz, and that timings should be set manually to get 100 MHz. Considering I prioritize stability and don't want to spend time validating timings or risk data corruption, for now I will stay with DOCP and the slightly annoying 99.8 MHz.

cyring commented 4 years ago

My comment is not about the terminology, but the correspondence between CH A/B in BIOS and Cha 1/0 in CoreFreq.

Can you test the develop branch for the DIMM slot position. Please post the topology printed by the Daemon

adatum commented 4 years ago

I don't notice a difference.

CoreFreq Daemon 1.80.9  Copyright (C) 2015-2020 CYRIL INGENIERIE
mc[0]   cha[0/2]    chip[0] sec[0]  size[0]
mc[0]   cha[0/2]    chip[0] sec[1]  size[0]
mc[0]   cha[0/2]    chip[1] sec[0]  size[0]
mc[0]   cha[0/2]    chip[1] sec[1]  size[0]
mc[0]   cha[0/2]    chip[2] sec[0]  size[8388608]
mc[0]   cha[0/2]    chip[2] sec[1]  size[0]
mc[0]   cha[0/2]    chip[3] sec[0]  size[8388608]
mc[0]   cha[0/2]    chip[3] sec[1]  size[0]
mc[0]   cha[1/2]    chip[0] sec[0]  size[0]
mc[0]   cha[1/2]    chip[0] sec[1]  size[0]
mc[0]   cha[1/2]    chip[1] sec[0]  size[0]
mc[0]   cha[1/2]    chip[1] sec[1]  size[0]
mc[0]   cha[1/2]    chip[2] sec[0]  size[8388608]
mc[0]   cha[1/2]    chip[2] sec[1]  size[0]
mc[0]   cha[1/2]    chip[3] sec[0]  size[8388608]
mc[0]   cha[1/2]    chip[3] sec[1]  size[0]

mem

cyring commented 4 years ago

I don't notice a difference.

I'm not sure to understand your DIMM topology and if the array of scanned registers and their inner bits can map the physical disposal ? Those registers are unfortunately undocumented. Can you post, once again, the output of zencli umc 0x0 and, if possible, any picture or scheme of the motherboard populated with the DIMM (including the empty slots)

adatum commented 4 years ago
Welcome to the Data Fabric: UMC has 2 x Channel(s)

CHA[0]  CHIP_BAR[0][0]=0x00050000 CHIP_BAR[0][1]=0x00050020
        CHIP_BAR[1][0]=0x00050010 CHIP_BAR[1][1]=0x00050028
CHA[0] CHIP[0:0] @ 0x00050000[0x00000000] Disable
CHA[0] MASK[0:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[0:1] @ 0x00050010[0x00000000] Disable
CHA[0] MASK[0:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[1:0] @ 0x00050004[0x00000000] Disable
CHA[0] MASK[1:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[1:1] @ 0x00050014[0x00000000] Disable
CHA[0] MASK[1:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[2:0] @ 0x00050008[0x00000001] Enable
CHA[0] MASK[2:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[2:1] @ 0x00050018[0x00000000] Disable
CHA[0] MASK[2:1] @ 0x0005002c[0x00000000]
CHA[0] CHIP[3:0] @ 0x0005000c[0x00000201] Enable
CHA[0] MASK[3:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[3:1] @ 0x0005001c[0x00000000] Disable
CHA[0] MASK[3:1] @ 0x0005002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
CHA[1]  CHIP_BAR[0][0]=0x00150000 CHIP_BAR[0][1]=0x00150020
        CHIP_BAR[1][0]=0x00150010 CHIP_BAR[1][1]=0x00150028
CHA[1] CHIP[0:0] @ 0x00150000[0x00000000] Disable
CHA[1] MASK[0:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[0:1] @ 0x00150010[0x00000000] Disable
CHA[1] MASK[0:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[1:0] @ 0x00150004[0x00000000] Disable
CHA[1] MASK[1:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[1:1] @ 0x00150014[0x00000000] Disable
CHA[1] MASK[1:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[2:0] @ 0x00150008[0x00000001] Enable
CHA[1] MASK[2:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[2:1] @ 0x00150018[0x00000000] Disable
CHA[1] MASK[2:1] @ 0x0015002c[0x00000000]
CHA[1] CHIP[3:0] @ 0x0015000c[0x00000201] Enable
CHA[1] MASK[3:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[3:1] @ 0x0015001c[0x00000000] Disable
CHA[1] MASK[3:1] @ 0x0015002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]

The 2x16GB DIMMs are populated in the DIMM_A2 and DIMM_B2 slots as per the manual: DIMM Configuration

cyring commented 4 years ago

The 2x16GB DIMMs are populated in the DIMM_A2 and DIMM_B2 slots as per the manual: DIMM Configuration

It is available in develop, please let me know what you get. Thank you.

cyring commented 4 years ago

EDIT 2: Everything works great here. I enabled ECC again in the BIOS, and the output in corefreq changed back to "1".

Many changes happened to support Threadripper. Based on the develop branch, can you do some non-regression tests, especially the Timings showing the ECC state.

adatum commented 4 years ago

It is available in develop, please let me know what you get. Thank you.

What would you like me to report? The memory controller window contents look the same.

cyring commented 4 years ago

It is available in develop, please let me know what you get. Thank you.

What would you like me to report? The memory controller window contents look the same.

Ok non regression after all the changes I've made. Thank you.

cyring commented 4 years ago

Here are the results of the Threadripper 3970X

                              Zen UMC  [1493]                              
Controller #0                                                Quad Channel  
 Bus Rate  1866 MT/s      Bus Speed 1883 MHz           DRAM Speed 3767 MHz 

 Cha   CL  RCDR RCDW  RP  RAS   RC  RRDS RRDL FAW  WTRS WTRL  WR  clRR clWW
  #0   16   15   14   14   32   46    4    6   20    4   12   12    4    4 
  #1   16   15   14   14   32   46    4    6   20    4   12   12    4    4 
  #2   16   15   14   14   32   46    4    6   20    4   12   12    4    4 
  #3   16   15   14   14   32   46    4    6   20    4   12   12    4    4 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   16    8    8    4    1    7    7    1    5    5    0    0    0    0 
  #1   16    8    8    4    1    7    7    1    5    5    0    0    0    0 
  #2   16    8    8    4    1    7    7    1    5    5    0    0    0    0 
  #3   16    8    8    4    1    7    7    1    5    5    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB sFAW dFAW Ban  Page  CKE  CMD  GDM  ECC
  #0 14553  298  192  132   0    0    0    0  R1W1   0    1   1T    ON   0 
  #1 14553  298  192  132   0    0    0    0  R1W1   0    1   1T    ON   0 
  #2 14553  298  192  132   0    0    0    0  R1W1   0    1   1T    ON   0 
  #3 14553  298  192  132   0    0    0    0  R1W1   0    1   1T    ON   0 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    2     65536      1024          16384                    
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    2     65536      1024          16384                    
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    2     65536      1024          16384                    
 DIMM Geometry for channel #3                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    2     65536      1024          16384                    
daviddgm commented 3 years ago

Hi! Can I get memory timings with zencli? All infos that I need are "corefreq-cli -M", but do not want to load module and start a service...

cyring commented 3 years ago

Hi! Can I get memory timings with zencli? All infos that I need are "corefreq-cli -M", but do not want to load module and start a service...

Unfortunately the DTR timings registers are not part of zencli whom purpose is debugging.