cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.97k stars 126 forks source link

2x Intel Xeon EC5509 SLBWM #264

Closed cyring closed 2 years ago

cyring commented 3 years ago

@svmlegacy Thank you very much for adding those results.

Can you please post the output of lspci -nn ?
I will add the PCI device id of the IMC to decode the DRAM data.

Regards, CyrIng

svmlegacy commented 3 years ago

Xeon EC5509 lspci -nn

cyring commented 3 years ago

We have to experiment with Jasper Forest because I don't find any meaningful datasheet of its registers. It is based on Nehlalem micro-architecture and according to its CPUID signature, in the class of Lynnfield.

https://github.com/cyring/CoreFreq/blob/0e8f292b6cce0ff00aeccb9e0654ec0be852e9db/corefreqk.h#L1641

static struct pci_device_id PCI_Nehalem_DMI_ids[] = {
    {   /* Lynnfield IMC                    */
        PCI_VDEVICE(INTEL, DID_INTEL_LYNNFIELD_MCR),
        .driver_data = (kernel_ulong_t) Lynnfield_IMC
    },
    {   /* Lynnfield IMC Test Registers             */
        PCI_VDEVICE(INTEL, DID_INTEL_LYNNFIELD_MC_TEST),
        .driver_data = (kernel_ulong_t) NHM_IMC_TR
    },
    { /* Lynnfield QuickPath Architecture Generic Non-core Registers */
        PCI_VDEVICE(INTEL, DID_INTEL_LYNNFIELD_NON_CORE),
        .driver_data = (kernel_ulong_t) NHM_NON_CORE
    },
    { /* Clarksfield Processor Uncore Device 0, Function 0      */
        PCI_VDEVICE(INTEL, DID_INTEL_CLARKSFIELD_NON_CORE),
        .driver_data = (kernel_ulong_t) NHM_NON_CORE
    },
    { /* Westmere/Clarkdale QuickPath Architecture Non-core Registers */
        PCI_VDEVICE(INTEL, DID_INTEL_CLARKDALE_NON_CORE),
        .driver_data = (kernel_ulong_t) NHM_NON_CORE
    },
    {   /* Jasper Forest IMC                    */
        PCI_VDEVICE(INTEL, 0x2cd8),
        .driver_data = (kernel_ulong_t) Lynnfield_IMC
    },
    {0, }
};

Fyi: The change is about adding the PCI device id [8086:2cd8] and invoke the existing Lynnfield_IMC decoding function (which should be called twice because there are two EC5509)


* Fully rebuild _CoreFreq_
 `make clean all`

* Save your files and start _CoreFreq_ 

* Check if we now have more data in the IMC part of the CLI ? like the channel count.
svmlegacy commented 3 years ago

Seems to have made some difference, but still not pulling in more data with ./corefreq-cli -M

I believe these CPU's to be very similar to Lynnfield, but have both DMI and QPI. QPI is used only for CPU -> CPU communication.

Unlike Lynnfield, these have a triple channel memory controller. (Though I believe Lynnfield physically has three controllers.)

I've attached my modified files if you have any questions on them.

2x Intel Xeon EC5509 IMC Test 1

cyring commented 3 years ago
cyring commented 3 years ago

Hello,

Commit dbc6ebcfad9cfad11047da17e775806c70a41470 provides the IMC for Nehalem C5500/C3500 Series

Once pulled, don't forget to fully rebuild the source code:

make clean all

IMC, DRAM, VT-d are expected from this change. Two controllers should be counted.

svmlegacy commented 3 years ago

Results here

Both IMC's are now appearing, with the DRAM information only appearing on Controller # 1.

VT-d now shows "ON", correct for my configuration.

cyring commented 3 years ago

Results here

Both IMC's are now appearing, with the DRAM information only appearing on Controller # 1.

VT-d now shows "ON", correct for my configuration.

Not so bad for a first try.

Dual controllers case is not easy to solve. I finally found datasheet volume 2 in which are mentioned two discriminant registers: Primary and Secondary Bus numbers. Those numbers should help in establishing a hardware topology for channel and DIMM queries. So far I'm relying on kernel bus numbering 0xfe and 0xff as listed in previous lspci report.

cyring commented 3 years ago

If you're interested in, send me an email to my address shown in corefreq-cli -h and I will send you back the two datasheet volumes of your processor family. Inside, there are registers of interest which need to be dump and study.

cyring commented 3 years ago

Hello,

In commit a6e7bca06b300b79b10b2ffe29a69a823d15f918 you will get a new IMC numbering. All Nehalem architectures are impacted by this change.

Thanks for any test.

svmlegacy commented 3 years ago

Commit https://github.com/cyring/CoreFreq/commit/a6e7bca06b300b79b10b2ffe29a69a823d15f918 and onward returns the controller to outputting nothing on Jasper Forest:

$ ./corefreq-cli -M
                           P55/Ibex Peak  [2CD8]                           
Controller #0                                                              
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC
cyring commented 3 years ago

Commit a6e7bca and onward returns the controller to outputting nothing on Jasper Forest:

$ ./corefreq-cli -M
                           P55/Ibex Peak  [2CD8]                           
Controller #0                                                              
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC

Can you please try with latest commit c135b2c00f7bb58ccd3d7d48c76bb28cec8ae16c

svmlegacy commented 3 years ago

Can you please try with latest commit c135b2c

Commit c135b2c edited Lynnfield_IMC, and not Nehalem_IMC which is being called by the C5500_C3500_IMC definition, thus had no effect.

I was able to get some output by forcing mc = 0 in corefreqk.c, as follows:

static PCI_CALLBACK Nehalem_IMC(struct pci_dev *dev)
{ /* Arrandale; Beckton; Bloomfield; Clarkdale; Eagleton; Gainestown; Gulftown*/
    kernel_ulong_t rc;
    const unsigned char bus_number = 0xff - dev->bus->number;
    const unsigned short mc = (unsigned short) bus_number % MC_MAX_CTRL;

    if (mc >= PUBLIC(RO(Proc))->Uncore.CtrlCount) {
        PUBLIC(RO(Proc))->Uncore.CtrlCount++;
    }
    /* rc = Query_NHM_IMC(dev, mc); */
    rc = Query_NHM_IMC(dev, 0); 

    return ((PCI_CALLBACK) rc);
}

Of course, this only output controller # 0, as follows:

$ ./corefreq-cli -M
                           P55/Ibex Peak  [2CD8]                           
Controller #0                                               Triple Channel 
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
  #0     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #1     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #2     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC
  #0     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #1     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #2     6    6   14    9    9    9    7    6    4    7    7    4   3    1 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    

I'll poke around a little more, and see if I can get the 2nd controller up.

cyring commented 3 years ago

Can you please try with latest commit c135b2c

Commit c135b2c edited Lynnfield_IMC, and not Nehalem_IMC which is being called by the C5500_C3500_IMC definition, thus had no effect.

I was able to get some output by forcing mc = 0 in corefreqk.c, as follows:

static PCI_CALLBACK Nehalem_IMC(struct pci_dev *dev)
{ /* Arrandale; Beckton; Bloomfield; Clarkdale; Eagleton; Gainestown; Gulftown*/
  kernel_ulong_t rc;
  const unsigned char bus_number = 0xff - dev->bus->number;
  const unsigned short mc = (unsigned short) bus_number % MC_MAX_CTRL;

  if (mc >= PUBLIC(RO(Proc))->Uncore.CtrlCount) {
      PUBLIC(RO(Proc))->Uncore.CtrlCount++;
  }
  /* rc = Query_NHM_IMC(dev, mc); */
  rc = Query_NHM_IMC(dev, 0); 

  return ((PCI_CALLBACK) rc);
}

Of course, this only output controller # 0, as follows:

$ ./corefreq-cli -M
                           P55/Ibex Peak  [2CD8]                           
Controller #0                                               Triple Channel 
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
  #0     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #1     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #2     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC
  #0     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #1     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #2     6    6   14    9    9    9    7    6    4    7    7    4   3    1 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    

I'll poke around a little more, and see if I can get the 2nd controller up.

Oh nice help. It is hard to imagine debug from here, but indeed the purpose is to iterate mc from 0 to N with 0 as the first controller.

I'm expecting to compute: Constant minus bus_number

0xff - 0xff = 0 first IMC 0xff - 0xfe = 1 second IMC

But things depend on bus numbering as listed by lspci

cyring commented 3 years ago

... I'll poke around a little more, and see if I can get the 2nd controller up.

Hello,

Are you getting something ?

svmlegacy commented 3 years ago

Hello,

Are you getting something ?

I'm expecting to compute: Constant minus bus_number

0xff - 0xff = 0 first IMC 0xff - 0xfe = 1 second IMC

But things depend on bus numbering as listed by lspci

The IMC's are reliably appearing at 0xFF.03 for the first CPU and 0xFE.03 for the 2nd CPU, based on the lspci results.

Haven't had much time to test lately, but still familiarizing myself with the code to see if I can find where this is breaking down.

svmlegacy commented 3 years ago

Is it possible that the Uncore_Update function is running out of PCI devices before querying the 2nd IMC?

I am seeing that NHM_IMC is only being run once when # ./corefreqd is started. Shouldn't this function be called for each IMC?

I tried bumping up the CHIP_MAX_PCI value from 24, but the daemon doesn't load with values greater than 25. lspci is showing a total of 44 devices under bus's 0xFF and 0xFE.

Additionally, this issue is also affecting my DP Westmere-EP server. (Xeon X5670's)

cyring commented 3 years ago

Is it possible that the Uncore_Update function is running out of PCI devices before querying the 2nd IMC?

I am seeing that NHM_IMC is only being run once when # ./corefreqd is started. Shouldn't this function be called for each IMC?

I tried bumping up the CHIP_MAX_PCI value from 24, but the daemon doesn't load with values greater than 25. lspci is showing a total of 44 devices under bus's 0xFF and 0xFE.

Additionally, this issue is also affecting my DP Westmere-EP server. (Xeon X5670's)

This could explain the issue ! Or be part of it.

At the time of first implementation I didn't have such a big workstation to decide the IDs array dimension.

However, Daemon should be called once for multiple controllers and iterates from the first to the last, aggregating for each IMC, their channels and DIMM data. Thus Driver has to prepare properly the whole multi-dim structure in prior.

I'm reviewing this part of code...

cyring commented 3 years ago

@svmlegacy

Only the probed Device DID in list will be added in array by the driver. So the whole lspci won't be scan.

Printing the whole structure could help to debug where the values have been stored (or not), depending on bus numbering.

In this function: https://github.com/cyring/CoreFreq/blob/2200d1dea99c1c95623166e539e319929190fcdf/corefreqd.c#L2778

For exemple, the DIMM presence register. Dump from 0 to max IMC, to max Channels, to max DIMMs:

for (mc = 0; mc < MC_MAX_CTRL; mc++)
    {
for (cha = 0; cha < MC_MAX_CHA; cha++)
      {
for (slot = 0; slot < MC_MAX_DIMM; slot++)
       {
        printf ("%d\n", Proc->Uncore.MC[mc].Channel[cha].DIMM[slot].DOD.DIMMPRESENT);
       }
     }
   }
cyring commented 3 years ago

@svmlegacy Hello,

Attached is a development version about issue. Make sure to build and load from another working directory (ex: /tmp/CoreFreq)

Your kernel log should have one trace per bus queried. Mine has one IMC on one bus. Result bellow as an exemple:

CoreFreq(1:7): Processor [ 06_2C] Architecture [Westmere/Gulftown] SMT [12/12]
BUS:0/255       CTRL:0/1

Fyi, traces come from function corefreqk.c/Nehalem_IMC

Thanks

CoreFreq_develop.tar.gz

svmlegacy commented 3 years ago

Logs pasted here

The controller that is listed in the system is from CPU0, verified by pulling a DIMM out of CPU1. (Report retained all three DIMM's)

cyring commented 3 years ago

Logs pasted here

The controller that is listed in the system is from CPU0, verified by pulling a DIMM out of CPU1. (Report retained all three DIMM's)

Thanks but from here I'm not sure that code has run the right Nehalem path ? kernel log should have two traces like BUS:../255 CTRL:../..

Can you give a look inside log again ? (plz, don't grep with any keyword)

svmlegacy commented 3 years ago

Logs pasted here The controller that is listed in the system is from CPU0, verified by pulling a DIMM out of CPU1. (Report retained all three DIMM's)

Thanks but from here I'm not sure that code has run the right Nehalem path ? kernel log should have two traces like BUS:../255 CTRL:../..

Can you give a look inside log again ? (plz, don't grep with any keyword)

Ah, right:

[  178.689586] corefreqk: loading out-of-tree module taints kernel.
[  178.689897] corefreqk: module verification failed: signature and/or required key missing - tainting kernel
[  178.733848] CoreFreq(1:-1): Processor [ 06_1E] Architecture [Nehalem/Lynnfield] CPU [8/8]
[  178.734369] BUS:0/254    CTRL:0/1
cyring commented 3 years ago

Now I see my mistake. As an input the PCI list has only one id for each device to be queried. But with your platform, that input list should have twice the same id b/c both IMC are queried under the same device id. I also need to find a hint when a platform comes with a single or multiple IMC. Could be the max bus number: 255 with my platform, 254 with yours.

svmlegacy commented 3 years ago

I also need to find a hint when a platform comes with a single or multiple IMC. Could be the max bus number: 255 with my platform, 254 with yours.

Interesting points. Though, shouldn't my max bus be 255? The IMC's are the same device id on two different busses. Is it possible to count the number of times the DID appears as the tip off?

cyring commented 3 years ago

I also need to find a hint when a platform comes with a single or multiple IMC. Could be the max bus number: 255 with my platform, 254 with yours.

Interesting points. Though, shouldn't my max bus be 255? The IMC's are the same device id on two different busses. Is it possible to count the number of times the DID appears as the tip off?

I was also expecting max bus to be 255. But it appears to be same as device bus number

svmlegacy commented 3 years ago

dmesg:

[12096.113489] CoreFreq(0:-1): Processor [ 06_1E] Architecture [Nehalem/Lynnfield] CPU [8/8]
[12096.113945] CoreFreq: BUS:1:254:254  CTRL:1/1
[12096.114083] CoreFreq: BUS:1:254:254  CTRL:1/2

Corefreq:

[svmlegacy@fedora CoreFreq]$ ./corefreq-cli -M
                           P55/Ibex Peak  [2CD8]                           
Controller #0                                                              
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC

Controller #1                                               Triple Channel 
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
  #0     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #1     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #2     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC
  #0     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #1     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #2     6    6   14    9    9    9    7    6    4    7    7    4   3    1 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
cyring commented 3 years ago

I believe lspci and the raw kernel pci and bus structures disagree on numbering.

If kernel is telling us that max bus number is 254 then we can safety iterate up to 255, incrementing the IMC number and fetch the associated channel data during each loop.

With my single IMC platform, max bus starts directly at 255. Thus no need to iterate.

Frankly, previous case was much easier with Haswell MP where each IMC has an unique device id.

cyring commented 3 years ago

Thinking of it, I could make use of the Processors topology you are providing here It will first give me the assurance of the IMC count to iterate over; secondly I will be able to query each controller on the first online Core traced to it.

cyring commented 3 years ago

For your tests, a new version as mentioned above. Save your files before starting driver. This time, traces in kernel log are preceded with CoreFreq: CoreFreq_develop.tar.gz Thank you

svmlegacy commented 3 years ago

For your tests, a new version as mentioned above. Save your files before starting driver. This time, traces in kernel log are preceded with CoreFreq: CoreFreq_develop.tar.gz Thank you

Very Promising!

[svmlegacy@fedora CoreFreq]$ ./corefreq-cli -M
                           P55/Ibex Peak  [2CD8]                           
Controller #0                                               Triple Channel 
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
  #0     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #1     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #2     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC
  #0     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #1     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #2     6    6   14    9    9    9    7    6    4    7    7    4   3    1 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
Controller #1                                               Triple Channel 
 Bus Rate  2500 MT/s      Bus Speed 2493 MT/s          DRAM Speed 1063 MHz 

 Cha    CL  RCD   RP  RAS  RRD  RFC   WR RTPr WTPr  FAW  B2B  CWL CMD  REFI
  #0     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #1     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
  #2     7    7    7   20    4   59    8    6   19   20    0    6  1T   509
      ddWR drWR srWR ddRW drRW srRW ddRR drRR srRR ddWW drWW srWW CKE   ECC
  #0     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #1     6    6   14    9    9    9    7    6    4    7    7    4   3    1 
  #2     6    6   14    9    9    9    7    6    4    7    7    4   3    1 

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    
 DIMM Geometry for channel #2                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0     8    1     16384      1024           1024                    

dmesg:

[23032.100794] CoreFreq(0:-1): Processor [ 06_1E] Architecture [Nehalem/Lynnfield] CPU [8/8]
[23032.101229] CoreFreq: CPU[0] PKG[0] SKT[0] BUS[254] tMC[1] > RC[0]
[23032.101331] CoreFreq: CPU[4] PKG[1] SKT[1] BUS[255] tMC[2] > RC[0]

dmesg is showing it being pulled from both busses, and both controllers are showing DIMM's! I'll try pulling a DIMM from one CPU and make sure the results still make sense.

cyring commented 3 years ago

dmesg is showing it being pulled from both busses, and both controllers are showing DIMM's! I'll try pulling a DIMM from one CPU and make sure the results still make sense.

Yes, that's a good test b/c I'm not sure if timings data are not just a duplicate of the other one.

I'm also not sure about behavior when:

  1. All Cores of one Package are fully disabled
  2. Refreshing data from CLI by pressing the star key [*]
  3. Timings differs among sockets
svmlegacy commented 3 years ago

Screenshot from 2021-07-24 19-35-06 I did pull a single DIMM from CPU0 (or 1, not too sure how this board implements it), and it did remove one DIMM from "Controller # 1" only, so these are indeed the distinct controllers.

cyring commented 3 years ago

Other tests I would like:

cyring commented 3 years ago

Screenshot from 2021-07-24 19-35-06 I did pull a single DIMM from CPU0 (or 1, not too sure how this board implements it), and it did remove one DIMM from "Controller # 1" only, so these are indeed the distinct controllers.

Was the screen taken with the DIMM pulled ?

svmlegacy commented 3 years ago

Screenshot from 2021-07-24 19-35-06 I did pull a single DIMM from CPU0 (or 1, not too sure how this board implements it), and it did remove one DIMM from "Controller # 1" only, so these are indeed the distinct controllers.

Was the screen taken with the DIMM pulled ?

Nope, I had reinstalled it here. Full 6 DIMM's. I'm currently pulling some info for your previous post, and will get another screenshot with one removed.

Early notes:

svmlegacy commented 3 years ago

Screenshot from 2021-07-24 20-04-59 With 2nd DIMM (CH1) removed from CPU0

svmlegacy commented 3 years ago

Screenshot from 2021-07-24 20-09-54 Screenshot from 2021-07-24 20-11-26

VT-d shows as with the latest experiemental build. I am able to toggle C-states on.

cyring commented 3 years ago

Nice. All these outputs are rewarding.

I plane the followings:

  1. Detect and display the right architecture name. Like Jasper Forest. [done]
  2. Get the lowest IO limit C-State (rather than [UNS]pecified) [done] (111b reported thus no change to apply)
  3. Add one row spacing between each IMC section. [done]
  4. Vcore for your Super-I/O if datasheet available. Can you tell s/t about motherboard: brand, model, specs ?

EDIT: all work made so far is available in the develop branch.

cyring commented 3 years ago

Can you read that full register with the kernel msr-tool ?

modprobe msr
rdmsr -aX 0xe2
svmlegacy commented 3 years ago
   Can you tell s/t about motherboard: brand, model, specs ?

Trenton JXT6966 (92-506966-XXX) PICMG 1.3 Still haven't been able to identify a SIO chip. superiotool did not find anything useful either.

Dual LGA1366 socket for Jasper Lake CPU's, 3 DDR3 miniDIMM's per CPU (Each an individual channel) Intel 3240 Chipset. XGi Z11m Graphics. Dual Intel 82575EB Gigabit Ethernet ports.

  ![2021-07-25-105309_753x385_scrot](https://user-images.githubusercontent.com/11563789/126893454-893dac50-1d1c-4457-a1da-19c0761488bc.png)
  `101b` and `110b` could be the values got from MSR `PKG_CST_CONFIG_CONTROL`

Can you read that full register with the kernel msr-tool ?

modprobe msr
rdmsr -aX 0xe2

All CPU's report 111b, indicating no package C-state limit.

cyring commented 3 years ago

All of the monitored voltages listed in the table below are connected to the board’s ADT7462 and can be read by an application

OK here are my notes:


May be ADT7462.

We are looking for a register address to monitor the VID of those Voltage CPU VCCP{?,1,2}

Like Table 17. VOLTAGE VALUE AND LIMIT REGISTERS

Next, PIN Config Register in tables 49 and 50 to select VCCP{1,2}


Do you see any adt7462 running among the kernel modules or available for loading ?

svmlegacy commented 3 years ago

adt7462 available to load, and is showing a number of sensors with lm-sensors. Vccp2 is giving a somewhat odd response there of 1.59 V at idle. Vccp1 not listed, so I'm not sure how to interpret this.

cyring commented 3 years ago

adt7462 available to load, and is showing a number of sensors with lm-sensors. Vccp2 is giving a somewhat odd response there of 1.59 V at idle. Vccp1 not listed, so I'm not sure how to interpret this.

That voltage is surprisingly not far from a full loaded EC5509 2021-07-25-190043_618x37_scrot Perhaps the driver VID-Vcc formula is just computing the reverse. The ADT7462 datasheet is mentioning to use the Intel VID reference table which has to be read from the opposite direction. I wonder if you stress the Processor, you will see Voltage decreasing ?

svmlegacy commented 3 years ago

Unfortunately I don't see the voltage moving at all for Vccp2 in lm_sensors. Measured on the board as 0.920 V for both CPU's at idle. Under "Atomic Burn" load, this increased to 0.990 V (Close enough to 1.0 V at rated speed.) Thanks Trenton for giving me labelled pads!

svmlegacy commented 3 years ago

Screenshot from 2021-07-25 14-34-31

cyring commented 3 years ago

According to the datasheet Vccp{1,2} are respectively the min and max voltage. Now we have to find VID of current voltage. Conic stress functions should give you the most load on CPU

svmlegacy commented 3 years ago

I've done a good bit of poking around with the superI/O and it really seems like the kernel driver isn't reporting out VID or core voltage for this specific implementation.

Since the ADT7462 is highly configurable, the JXT6966 is likely using it in a non-standard config...

Celeron P1053 (single CPU config) and 2x EC5549 tested along with this, all with the same result of no Vcore from the adt7642 kernel driver.

cyring commented 3 years ago

Just wondering if other software can monitor Vcore ? Like BIOS in some tabs ? Or in the Windows world ?

svmlegacy commented 3 years ago

I haven't yet found a software that can read it properly, even in Windows/BIOS. The only way I can get it is by using a multimeter on the provided test pad, haha.

I think the SuperIO is connected to either VID or VRM output, but no generic software is configured for it. I'm really out of my element to test the device registers as well.

cyring commented 3 years ago

Hello, If you don't find any other problems, accuracy, features reported and so on, feel free to close that issue. Regards CyrIng