cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
2k stars 126 forks source link

Threadripper Zen2 regression?: Uneven (0°C) reading on half the cores #219

Closed Chlorophytus closed 3 years ago

Chlorophytus commented 3 years ago

Threadripper 3960X on an ASRock TRX40 Creator w/BIOS 1.70. openSUSE Tumbleweed with Linux 5.10.7.

This is it with hyperthreading off. When I turn it on half of the threads are still at 0°C.

CPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)
000    2.43   132  0.7250   30  000000000000000053    0.000808716   0.000808716
001    2.28   132  0.7250   30  000000000000000074    0.001129150   0.001129150
002    2.33   132  0.7250   30  000000000000000067    0.001022339   0.001022339
003   11.88   216  0.2000   30  000000000000000415    0.006332397   0.006332397
004   35.41   132  0.7250   30  000000000000001083    0.016525269   0.016525269
005   54.53   132  0.7250   30  000000000000001981    0.030227661   0.030227661
006    8.77   132  0.7250    0  000000000000000301    0.004592896   0.004592896
007   17.63   132  0.7250    0  000000000000000431    0.006576538   0.006576538
008    5.32   132  0.7250    0  000000000000000119    0.001815796   0.001815796
009   21.80   132  0.7250    0  000000000000000815    0.012435913   0.012435913
010   15.76   132  0.7250    0  000000000000000480    0.007324219   0.007324219
011   21.97   132  0.7250    0  000000000000000699    0.010665894   0.010665894
012   18.20   216  0.2000    0  000000000000000744    0.011352539   0.011352539
013   25.85   216  0.2000    0  000000000000000989    0.015090942   0.015090942
014    6.99   216  0.2000    0  000000000000000311    0.004745483   0.004745483
015   27.62   216  0.2000    0  000000000000001013    0.015457153   0.015457153
016   10.89    90  0.9875    0  000000000000000372    0.005676270   0.005676270
017   30.96    90  0.9875    0  000000000000001025    0.015640259   0.015640259
018    5.26    90  0.9875   32  000000000000000132    0.002014160   0.002014160
019    4.89   216  0.2000   32  000000000000000133    0.002029419   0.002029419
020    4.76   216  0.2000   32  000000000000000116    0.001770020   0.001770020
021    1.37   216  0.2000   32  000000000000000019    0.000289917   0.000289917
022    2.41   216  0.2000   32  000000000000000051    0.000778198   0.000778198
023    1.50   216  0.2000   32  000000000000000090    0.001373291   0.001373291
cyring commented 3 years ago

I have recently taken into account the CCX, in addition to CCD, to address the SMU offset register of temperature sensor. But that is wrong. I will rollback that change ASAP

Thank you for having make me known of this error.

cyring commented 3 years ago

Hello

I've revert the changes: can you please give a try to the develop branch ?

Chlorophytus commented 3 years ago
CPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)
000    1.78   132  0.7250   28  000000000000000082    0.001251221   0.001251221
001    1.81   132  0.7250   28  000000000000000060    0.000915527   0.000915527
002    2.76   132  0.7250   28  000000000000000072    0.001098633   0.001098633
003    4.25   132  0.7250   28  000000000000000111    0.001693726   0.001693726
004    2.65   132  0.7250   28  000000000000000094    0.001434326   0.001434326
005    7.34   132  0.7250   28  000000000000000273    0.004165649   0.004165649
006   22.30   132  0.7250    0  000000000000001329    0.020278931   0.020278931
007   35.80   132  0.7250    0  000000000000001256    0.019165039   0.019165039
008   25.10   132  0.7250    0  000000000000000962    0.014678955   0.014678955
009   19.78   132  0.7250    0  000000000000000531    0.008102417   0.008102417
010   13.69   132  0.7250    0  000000000000000422    0.006439209   0.006439209
011    8.01   132  0.7250    0  000000000000000245    0.003738403   0.003738403
012   17.74   132  0.7250   28  000000000000001298    0.019805908   0.019805908
013    9.85   132  0.7250   28  000000000000000513    0.007827759   0.007827759
014   11.64   132  0.7250   28  000000000000001339    0.020431519   0.020431519
015   13.48   132  0.7250   28  000000000000000439    0.006698608   0.006698608
016    1.79   132  0.7250   28  000000000000000089    0.001358032   0.001358032
017   22.35   132  0.7250   28  000000000000000549    0.008377075   0.008377075
018    4.55   132  0.7250    0  000000000000000181    0.002761841   0.002761841
019    7.29   132  0.7250    0  000000000000000535    0.008163452   0.008163452
020    5.31   132  0.7250    0  000000000000000555    0.008468628   0.008468628
021    5.01   132  0.7250    0  000000000000000153    0.002334595   0.002334595
022    2.51   132  0.7250    0  000000000000000101    0.001541138   0.001541138
023    8.14   132  0.7250    0  000000000000000443    0.006759644   0.006759644
Chlorophytus commented 3 years ago

CCDs are considered 1, 3, 5, and 7 on this CPU if that helps. :)

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +31.0°C  
Tdie:         +31.0°C  
Tccd1:        +28.8°C  
Tccd3:        +29.2°C  
Tccd5:        +30.0°C  
Tccd7:        +30.8°C 
cyring commented 3 years ago

Can you show me the topology?

corefreq-cli -m
Chlorophytus commented 3 years ago
CPU Pkg  Apic  Core/Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID CCD CCX ID/ID L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
000:BSP    0   0  0   0  0      32  8        32  8       512  8 i  131072 16w 
001:  0    1   0  0   1  0      32  8        32  8       512  8 i  131072 16w 
002:  0    2   0  0   2  0      32  8        32  8       512  8 i  131072 16w 
003:  0    4   0  1   4  0      32  8        32  8       512  8 i  131072 16w 
004:  0    5   0  1   5  0      32  8        32  8       512  8 i  131072 16w 
005:  0    6   0  1   6  0      32  8        32  8       512  8 i  131072 16w 
006:  0    8   1  2   8  0      32  8        32  8       512  8 i  131072 16w 
007:  0    9   1  2   9  0      32  8        32  8       512  8 i  131072 16w 
008:  0   10   1  2  10  0      32  8        32  8       512  8 i  131072 16w 
009:  0   12   1  3  12  0      32  8        32  8       512  8 i  131072 16w 
010:  0   13   1  3  13  0      32  8        32  8       512  8 i  131072 16w 
011:  0   14   1  3  14  0      32  8        32  8       512  8 i  131072 16w 
012:  0   16   2  4  16  0      32  8        32  8       512  8 i  131072 16w 
013:  0   17   2  4  17  0      32  8        32  8       512  8 i  131072 16w 
014:  0   18   2  4  18  0      32  8        32  8       512  8 i  131072 16w 
015:  0   20   2  5  20  0      32  8        32  8       512  8 i  131072 16w 
016:  0   21   2  5  21  0      32  8        32  8       512  8 i  131072 16w 
017:  0   22   2  5  22  0      32  8        32  8       512  8 i  131072 16w 
018:  0   24   3  6  24  0      32  8        32  8       512  8 i  131072 16w 
019:  0   25   3  6  25  0      32  8        32  8       512  8 i  131072 16w 
020:  0   26   3  6  26  0      32  8        32  8       512  8 i  131072 16w 
021:  0   28   3  7  28  0      32  8        32  8       512  8 i  131072 16w 
022:  0   29   3  7  29  0      32  8        32  8       512  8 i  131072 16w 
023:  0   30   3  7  30  0      32  8        32  8       512  8 i  131072 16w
cyring commented 3 years ago

Here, Matisse is now booted with SMT disabled but I can't reproduce that bug

$ corefreq-cli -m
CPU Pkg  Apic  Core/Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID CCD CCX ID/ID L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
000:BSP    0   0  0   0  0      32  8        32  8       512  8 i   65536 16w 
001:  0    1   0  0   1  0      32  8        32  8       512  8 i   65536 16w 
002:  0    2   0  0   2  0      32  8        32  8       512  8 i   65536 16w 
003:  0    3   0  0   3  0      32  8        32  8       512  8 i   65536 16w 
004:  0    4   0  1   4  0      32  8        32  8       512  8 i   65536 16w 
005:  0    5   0  1   5  0      32  8        32  8       512  8 i   65536 16w 
006:  0    6   0  1   6  0      32  8        32  8       512  8 i   65536 16w 
007:  0    7   0  1   7  0      32  8        32  8       512  8 i   65536 16w 
008:  0    8   1  2   8  0      32  8        32  8       512  8 i   65536 16w 
009:  0    9   1  2   9  0      32  8        32  8       512  8 i   65536 16w 
010:  0   10   1  2  10  0      32  8        32  8       512  8 i   65536 16w 
011:  0   11   1  2  11  0      32  8        32  8       512  8 i   65536 16w 
012:  0   12   1  3  12  0      32  8        32  8       512  8 i   65536 16w 
013:  0   13   1  3  13  0      32  8        32  8       512  8 i   65536 16w 
014:  0   14   1  3  14  0      32  8        32  8       512  8 i   65536 16w 
015:  0   15   1  3  15  0      32  8        32  8       512  8 i   65536 16w 

$ corefreq-cli -C
CPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)
000    3.24    79  1.0562   31  000000000000000199    0.003036499   0.003036499
001    0.84    79  1.0562   31  000000000000000064    0.000976562   0.000976562
002    2.20    79  1.0562   31  000000000000000150    0.002288818   0.002288818
003    1.01    79  1.0562   31  000000000000000074    0.001129150   0.001129150
004    0.87    79  1.0562   31  000000000000000024    0.000366211   0.000366211
005   40.05    79  1.0562   31  000000000000002755    0.042037964   0.042037964
006   54.27    79  1.0562   31  000000000000002561    0.039077759   0.039077759
007   11.72    79  1.0562   31  000000000000000906    0.013824463   0.013824463
008    1.16    79  1.0562   32  000000000000000083    0.001266479   0.001266479
009    1.65    79  1.0562   32  000000000000000118    0.001800537   0.001800537
010    1.11    79  1.0562   32  000000000000000070    0.001068115   0.001068115
011    1.63    79  1.0562   32  000000000000000098    0.001495361   0.001495361
012   34.02    79  1.0562   32  000000000000001915    0.029220581   0.029220581
013    0.84    79  1.0562   32  000000000000000014    0.000213623   0.000213623
014    3.50    79  1.0562   32  000000000000000409    0.006240845   0.006240845
015   11.50    79  1.0562   32  000000000000000027    0.000411987   0.000411987

              Package        Cores          Uncore         Memory
Energy(J):   16.803909302    0.144454956   15.042404175    0.000000000
Power(W) :   16.803909302    0.144454956   15.042404175    0.000000000
cyring commented 3 years ago

EDIT: I'm understanding something.

Based on the develop branch can you replace this function ... https://github.com/cyring/CoreFreq/blob/d9ab3b7707d311b5a24859969a54914b192c6989/corefreqk.c#L9575

... with this code:

void CCD_AMD_Family_17h_Zen2_Temp(CORE_RO *Core)
{
    TCCD_REGISTER TccdSensor = {.value = 0};

    Core_AMD_SMN_Read(  TccdSensor,
                (SMU_AMD_THM_TCTL_CCD_REGISTER_F17H
                + (Core->T.Cluster.CCD << 1)),
                SMU_AMD_INDEX_REGISTER_F17H,
                SMU_AMD_DATA_REGISTER_F17H );

    Core->PowerThermal.Sensor = TccdSensor.CurTmp;

    if (TccdSensor.CurTempRangeSel == 1)
    {
        Core->PowerThermal.Param.Offset[1] = 49;
    } else {
        Core->PowerThermal.Param.Offset[1] = 0;
    }
}

Next fully rebuild, reload all and test

Fyi, I'm changing the offset translation from + (Core->T.Cluster.CCD << 2)
to + (Core->T.Cluster.CCD << 1)
to reach CCD 0 , 2 , 4 , 6 (zero based)

Chlorophytus commented 3 years ago
CPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)
000   24.27   132  0.7250   31  000000000000012964    0.197814941   0.197814941
001   54.31   132  0.7250   31  000000000000014732    0.224792480   0.224792480
002   38.85   132  0.7250   31  000000000000013688    0.208862305   0.208862305
003   62.52   132  0.7250   31  000000000000015457    0.235855103   0.235855103
004    1.48   132  0.7250   31  000000000000000156    0.002380371   0.002380371
005   12.80   132  0.7250   31  000000000000001122    0.017120361   0.017120361
006   99.70   132  0.7250   31  000000000000009493    0.144851685   0.144851685
007   50.24   132  0.7250   31  000000000000010043    0.153244019   0.153244019
008   78.11   132  0.7250   31  000000000000008239    0.125717163   0.125717163
009    0.50   132  0.7250   31  000000000000000117    0.001785278   0.001785278
010    0.26   132  0.7250   31  000000000000000116    0.001770020   0.001770020
011    0.44   127  0.7562   31  000000000000000117    0.001785278   0.001785278
012    1.22   127  0.7562    0  000000000000000289    0.004409790   0.004409790
013    0.50   127  0.7562    0  000000000000000179    0.002731323   0.002731323
014    1.96   127  0.7562    0  000000000000000300    0.004577637   0.004577637
015   36.93   127  0.7562    0  000000000000017879    0.272811890   0.272811890
016   21.01   127  0.7562    0  000000000000017905    0.273208618   0.273208618
017   14.63   127  0.7562    0  000000000000016842    0.256988525   0.256988525
018    9.05   127  0.7562    0  000000000000000659    0.010055542   0.010055542
019   56.87   127  0.7562    0  000000000000013891    0.211959839   0.211959839
020    4.02   127  0.7562    0  000000000000000348    0.005310059   0.005310059
021   21.26   127  0.7562    0  000000000000001296    0.019775391   0.019775391
022   12.18   127  0.7562    0  000000000000000672    0.010253906   0.010253906
023   31.11   127  0.7562    0  000000000000002799    0.042709351   0.042709351

              Package        Cores          Uncore         Memory
Energy(J):   48.711715698    2.430770874   11.006805420    0.000000000
Power(W) :   48.711715698    2.430770874   11.006805420    0.000000000

^CCPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)

              Package        Cores          Uncore         Memory
Energy(J):   48.711715698    2.430770874   11.006805420    0.000000000
Power(W) :   48.711715698    2.430770874   11.006805420    0.000000000
cyring commented 3 years ago

Thanks for trying. The change is unfortunately not good.

It may have different connected links between SMU and CCD(s) from a 3970X down to a 3960X. Probably based on the activated Cores.

So I'm back with the bellow query. Some addresses may return zero and others a non zero value.

Can you exec these:

# zencli smu 0x00059954
# zencli smu 0x00059958
# zencli smu 0x0005995c
# zencli smu 0x00059960
# zencli smu 0x00059964
# zencli smu 0x00059968
# zencli smu 0x0005996c
# zencli smu 0x00059970
Chlorophytus commented 3 years ago
# ./zencli smu 0x00059954
0x00000a90 (2704)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1010 1001 0000
# ./zencli smu 0x00059958
0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
# ./zencli smu 0x0005995c
0x00000a98 (2712)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1010 1001 1000
# ./zencli smu 0x00059960
0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
# ./zencli smu 0x00059964
0x00000a98 (2712)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1010 1001 1000
# ./zencli smu 0x00059968
0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
# ./zencli smu 0x0005996c
0x00000aa0 (2720)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1010 1010 0000
# ./zencli smu 0x00059970
0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
cyring commented 3 years ago

Hello, Can you pull and try the latest develop branch: it includes a topology fix for CastlePeak Please provide the output of the Topology and Sensors. Regards

Chlorophytus commented 3 years ago

All fixed :)

Topology

$ corefreq-cli -m
CPU Pkg  Apic  Core/Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID CCD CCX ID/ID L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
000:BSP    0   0  0   0  0      32  8        32  8       512  8 i  131072 16w 
001:  0    1   0  0   1  0      32  8        32  8       512  8 i  131072 16w 
002:  0    2   0  0   2  0      32  8        32  8       512  8 i  131072 16w 
003:  0    4   0  1   4  0      32  8        32  8       512  8 i  131072 16w 
004:  0    5   0  1   5  0      32  8        32  8       512  8 i  131072 16w 
005:  0    6   0  1   6  0      32  8        32  8       512  8 i  131072 16w 
006:  0    8   2  2   8  0      32  8        32  8       512  8 i  131072 16w 
007:  0    9   2  2   9  0      32  8        32  8       512  8 i  131072 16w 
008:  0   10   2  2  10  0      32  8        32  8       512  8 i  131072 16w 
009:  0   12   2  3  12  0      32  8        32  8       512  8 i  131072 16w 
010:  0   13   2  3  13  0      32  8        32  8       512  8 i  131072 16w 
011:  0   14   2  3  14  0      32  8        32  8       512  8 i  131072 16w 
012:  0   16   4  4  16  0      32  8        32  8       512  8 i  131072 16w 
013:  0   17   4  4  17  0      32  8        32  8       512  8 i  131072 16w 
014:  0   18   4  4  18  0      32  8        32  8       512  8 i  131072 16w 
015:  0   20   4  5  20  0      32  8        32  8       512  8 i  131072 16w 
016:  0   21   4  5  21  0      32  8        32  8       512  8 i  131072 16w 
017:  0   22   4  5  22  0      32  8        32  8       512  8 i  131072 16w 
018:  0   24   6  6  24  0      32  8        32  8       512  8 i  131072 16w 
019:  0   25   6  6  25  0      32  8        32  8       512  8 i  131072 16w 
020:  0   26   6  6  26  0      32  8        32  8       512  8 i  131072 16w 
021:  0   28   6  7  28  0      32  8        32  8       512  8 i  131072 16w 
022:  0   29   6  7  29  0      32  8        32  8       512  8 i  131072 16w 
023:  0   30   6  7  30  0      32  8        32  8       512  8 i  131072 16w 

Sensors

CPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)
000    6.05   127  0.7562   31  000000000000000162    0.002471924   0.002471924
001    3.84   127  0.7562   31  000000000000000149    0.002273560   0.002273560
002    1.80   127  0.7562   31  000000000000000312    0.004760742   0.004760742
003    8.44   127  0.7562   31  000000000000000198    0.003021240   0.003021240
004    1.38   127  0.7562   31  000000000000000082    0.001251221   0.001251221
005    5.70   127  0.7562   31  000000000000000187    0.002853394   0.002853394
006   14.92   127  0.7562   31  000000000000000470    0.007171631   0.007171631
007   15.94   127  0.7562   31  000000000000000429    0.006546021   0.006546021
008   17.12   127  0.7562   31  000000000000000546    0.008331299   0.008331299
009   41.42   127  0.7562   31  000000000000001193    0.018203735   0.018203735
010    4.75   127  0.7562   31  000000000000000134    0.002044678   0.002044678
011   13.04   127  0.7562   31  000000000000000299    0.004562378   0.004562378
012    5.16   127  0.7562   32  000000000000000316    0.004821777   0.004821777
013    8.04   127  0.7562   32  000000000000000255    0.003890991   0.003890991
014    3.04   127  0.7562   32  000000000000000098    0.001495361   0.001495361
015    0.93   127  0.7562   32  000000000000000038    0.000579834   0.000579834
016    0.73   127  0.7562   32  000000000000000030    0.000457764   0.000457764
017    0.88   127  0.7562   32  000000000000000086    0.001312256   0.001312256
018    3.77   127  0.7562   33  000000000000000128    0.001953125   0.001953125
019    0.56   127  0.7562   33  000000000000000030    0.000457764   0.000457764
020    2.01   127  0.7562   33  000000000000000143    0.002182007   0.002182007
021    5.93   127  0.7562   33  000000000000000148    0.002258301   0.002258301
022    2.42   127  0.7562   33  000000000000000079    0.001205444   0.001205444
023    2.14   127  0.7562   33  000000000000000096    0.001464844   0.001464844

              Package        Cores          Uncore         Memory
Energy(J):   48.891738892    0.085571289   10.683074951    0.000000000
Power(W) :   48.891738892    0.085571289   10.683074951    0.000000000
cyring commented 3 years ago

These per CCD temperatures look great !

Thank you for all your testings.

cyring commented 3 years ago

Btw, whenever you can, please let me know about results when SMT is enabled in BIOS ?

Chlorophytus commented 3 years ago

I will in a bit, I have pending OS updates

Chlorophytus commented 3 years ago
$ corefreq-cli -m
CPU Pkg  Apic  Core/Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID CCD CCX ID/ID L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
000:BSP    0   0  0   0  0      32  8        32  8       512  8 i  131072 16w 
001:  0    2   0  0   1  0      32  8        32  8       512  8 i  131072 16w 
002:  0    4   0  0   2  0      32  8        32  8       512  8 i  131072 16w 
003:  0    8   0  1   4  0      32  8        32  8       512  8 i  131072 16w 
004:  0   10   0  1   5  0      32  8        32  8       512  8 i  131072 16w 
005:  0   12   0  1   6  0      32  8        32  8       512  8 i  131072 16w 
006:  0   16   2  2   8  0      32  8        32  8       512  8 i  131072 16w 
007:  0   18   2  2   9  0      32  8        32  8       512  8 i  131072 16w 
008:  0   20   2  2  10  0      32  8        32  8       512  8 i  131072 16w 
009:  0   24   2  3  12  0      32  8        32  8       512  8 i  131072 16w 
010:  0   26   2  3  13  0      32  8        32  8       512  8 i  131072 16w 
011:  0   28   2  3  14  0      32  8        32  8       512  8 i  131072 16w 
012:  0   32   4  4  16  0      32  8        32  8       512  8 i  131072 16w 
013:  0   34   4  4  17  0      32  8        32  8       512  8 i  131072 16w 
014:  0   36   4  4  18  0      32  8        32  8       512  8 i  131072 16w 
015:  0   40   4  5  20  0      32  8        32  8       512  8 i  131072 16w 
016:  0   42   4  5  21  0      32  8        32  8       512  8 i  131072 16w 
017:  0   44   4  5  22  0      32  8        32  8       512  8 i  131072 16w 
018:  0   48   6  6  24  0      32  8        32  8       512  8 i  131072 16w 
019:  0   50   6  6  25  0      32  8        32  8       512  8 i  131072 16w 
020:  0   52   6  6  26  0      32  8        32  8       512  8 i  131072 16w 
021:  0   56   6  7  28  0      32  8        32  8       512  8 i  131072 16w 
022:  0   58   6  7  29  0      32  8        32  8       512  8 i  131072 16w 
023:  0   60   6  7  30  0      32  8        32  8       512  8 i  131072 16w 
024:  0    1   0  0   0  1      32  8        32  8       512  8 i  131072 16w 
025:  0    3   0  0   1  1      32  8        32  8       512  8 i  131072 16w 
026:  0    5   0  0   2  1      32  8        32  8       512  8 i  131072 16w 
027:  0    9   0  1   4  1      32  8        32  8       512  8 i  131072 16w 
028:  0   11   0  1   5  1      32  8        32  8       512  8 i  131072 16w 
029:  0   13   0  1   6  1      32  8        32  8       512  8 i  131072 16w 
030:  0   17   2  2   8  1      32  8        32  8       512  8 i  131072 16w 
031:  0   19   2  2   9  1      32  8        32  8       512  8 i  131072 16w 
032:  0   21   2  2  10  1      32  8        32  8       512  8 i  131072 16w 
033:  0   25   2  3  12  1      32  8        32  8       512  8 i  131072 16w 
034:  0   27   2  3  13  1      32  8        32  8       512  8 i  131072 16w 
035:  0   29   2  3  14  1      32  8        32  8       512  8 i  131072 16w 
036:  0   33   4  4  16  1      32  8        32  8       512  8 i  131072 16w 
037:  0   35   4  4  17  1      32  8        32  8       512  8 i  131072 16w 
038:  0   37   4  4  18  1      32  8        32  8       512  8 i  131072 16w 
039:  0   41   4  5  20  1      32  8        32  8       512  8 i  131072 16w 
040:  0   43   4  5  21  1      32  8        32  8       512  8 i  131072 16w 
041:  0   45   4  5  22  1      32  8        32  8       512  8 i  131072 16w 
042:  0   49   6  6  24  1      32  8        32  8       512  8 i  131072 16w 
043:  0   51   6  6  25  1      32  8        32  8       512  8 i  131072 16w 
044:  0   53   6  6  26  1      32  8        32  8       512  8 i  131072 16w 
045:  0   57   6  7  28  1      32  8        32  8       512  8 i  131072 16w 
046:  0   59   6  7  29  1      32  8        32  8       512  8 i  131072 16w 
047:  0   61   6  7  30  1      32  8        32  8       512  8 i  131072 16w 
CPU Freq(MHz) VID  Vcore  TMP(C)    Accumulator       Energy(J)     Power(W)
000    3.74   124  0.7750   33  000000000000001401    0.021377563   0.021377563
001    3.98   124  0.7750   33  000000000000001436    0.021911621   0.021911621
002    1.02   124  0.7750   33  000000000000001042    0.015899658   0.015899658
003    7.67   124  0.7750   33  000000000000028821    0.439773560   0.439773560
004   17.84   124  0.7750   33  000000000000025783    0.393417358   0.393417358
005    8.36   122  0.7875   33  000000000000028023    0.427597046   0.427597046
006    8.24   124  0.7750   34  000000000000001683    0.025680542   0.025680542
007   42.80   124  0.7750   34  000000000000031466    0.480133057   0.480133057
008    7.75   124  0.7750   34  000000000000002640    0.040283203   0.040283203
009    5.74   124  0.7750   34  000000000000001626    0.024810791   0.024810791
010    3.72   124  0.7750   34  000000000000030440    0.464477539   0.464477539
011    5.81   124  0.7750   34  000000000000003200    0.048828125   0.048828125
012    0.79   124  0.7750   34  000000000000001259    0.019210815   0.019210815
013    1.23   124  0.7750   34  000000000000001505    0.022964478   0.022964478
014    1.09   124  0.7750   34  000000000000001304    0.019897461   0.019897461
015    1.98   124  0.7750   34  000000000000001523    0.023239136   0.023239136
016   29.07   124  0.7750   34  000000000000003183    0.048568726   0.048568726
017    1.13   122  0.7875   34  000000000000001265    0.019302368   0.019302368
018   22.40   124  0.7750   35  000000000000035431    0.540634155   0.540634155
019   24.94   124  0.7750   35  000000000000034909    0.532669067   0.532669067
020   28.93   124  0.7750   35  000000000000033373    0.509231567   0.509231567
021    5.92   124  0.7750   35  000000000000001581    0.024124146   0.024124146
022    3.10   124  0.7750   35  000000000000001369    0.020889282   0.020889282
023    9.70   124  0.7750   35  000000000000002001    0.030532837   0.030532837
024    1.83   124  0.7750   33  000000000000000000    0.000000000   0.000000000
025    2.26   124  0.7750   33  000000000000000000    0.000000000   0.000000000
026    2.46   124  0.7750   33  000000000000000000    0.000000000   0.000000000
027   19.74   124  0.7750   33  000000000000000000    0.000000000   0.000000000
028    8.70   124  0.7750   33  000000000000000000    0.000000000   0.000000000
029    3.45   124  0.7750   33  000000000000000000    0.000000000   0.000000000
030    0.96   124  0.7750   34  000000000000000000    0.000000000   0.000000000
031   11.19   124  0.7750   34  000000000000000000    0.000000000   0.000000000
032    9.43   124  0.7750   34  000000000000000000    0.000000000   0.000000000
033    2.04   124  0.7750   34  000000000000000000    0.000000000   0.000000000
034   15.66   124  0.7750   34  000000000000000000    0.000000000   0.000000000
035    8.15   124  0.7750   34  000000000000000000    0.000000000   0.000000000
036    1.21   122  0.7875   34  000000000000000000    0.000000000   0.000000000
037    2.21   124  0.7750   34  000000000000000000    0.000000000   0.000000000
038    0.96   124  0.7750   34  000000000000000000    0.000000000   0.000000000
039    2.90   124  0.7750   34  000000000000000000    0.000000000   0.000000000
040    1.20   124  0.7750   34  000000000000000000    0.000000000   0.000000000
041    0.58   124  0.7750   34  000000000000000000    0.000000000   0.000000000
042   42.34   122  0.7875   35  000000000000000000    0.000000000   0.000000000
043   49.99   124  0.7750   35  000000000000000000    0.000000000   0.000000000
044   41.57   124  0.7750   35  000000000000000000    0.000000000   0.000000000
045    2.17   124  0.7750   35  000000000000000000    0.000000000   0.000000000
046    2.67   124  0.7750   35  000000000000000000    0.000000000   0.000000000
047    3.76   122  0.7875   35  000000000000000000    0.000000000   0.000000000

              Package        Cores          Uncore         Memory
Energy(J):   65.807144165    4.215454102   11.006805420    0.000000000
Power(W) :   65.807144165    4.215454102   11.006805420    0.000000000
cyring commented 3 years ago

Thanks.

Topology appears consistent in both SMT modes. Physical + Sibling Cores, grouped by CCD, are providing the same temperature value. Stressing any CPU from a CCD and you will observe that.

My todo list: I have to filter this output based on the sensor scope. For exemple, the Power accumulator is per Physical Core (not per Thread Core). That's why we see zero values in Energy + Power columns.

Chlorophytus commented 3 years ago

I've heard that different AGESA versions seem to change the Zen temperature monitoring registers. I'll update if that happens on TRX40 again.

cyring commented 3 years ago

I've heard that different AGESA versions seem to change the Zen temperature monitoring registers. I'll update if that happens on TRX40 again.

Very interesting, thank you. Do you think it impacts the SMU registers addresses ? We're sandboxing the MailBox

cyring commented 3 years ago

Reopening issue.Sorry bad SmartPhone move.

We are indeed working on the SMU mailbox protocol which may reveal the AGESA differences. Hope so. Definitely we would code faster if AMD decides to publish the full BKDG specifications of families 17h, 19h

cyring commented 3 years ago

Not sure about the remaining work with TR ? Based on current develop branch, can you please show the Sensors output with all CPU fully loaded ? I would like to check if the Power/Energy estimation is coherent with TDP ?

Chlorophytus commented 3 years ago

Running Prime95 (an optimized AVX2 benchmark) will indeed give approximately 280W on the package power, using BIOS defaults.

This is the Threadripper 3960X's specified TDP.

cyring commented 3 years ago

Running Prime95 (an optimized AVX2 benchmark) will indeed give approximately 280W on the package power, using BIOS defaults.

This is the Threadripper 3960X's specified TDP.

Sorry, I would mean can you show me a CoreFreq screenshot showing the Sensors view with Processor stressed like this CoreFreq_Sensors

Remark: you can also use the menu Tools > Conic Compute > Hyperboloid of two sheets to quickly stress all Cores. CoreFreq_Conic

Chlorophytus commented 3 years ago

image

EDIT: The hyperboloid of two sheets test doesn't seem to stress to TDP, but Prime95 does.

cyring commented 3 years ago

I clearly see that the 280 W are estimated by CoreFreq Thank you. for this confirmation. May come back for any purpose.

image

EDIT: The hyperboloid of two sheets test doesn't seem to stress to TDP, but Prime95 does.