cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
2.01k stars 128 forks source link

Ryzen 9 7950X #378

Closed cyring closed 8 months ago

cyring commented 1 year ago

7950X

cyring commented 1 year ago

@KeithMyers

Got a weird error for just doing make. My time is correct.

Probably due to my RTC which is synced manually. Binaries should be built any way. Can you tell if a UMC is enabled ?

EDIT: your answer here

cyring commented 1 year ago

@KeithMyers Hello,

Can you please use my zencli and dump the following addresses:

$ cc zencli.c -o zencli
# ./zencli smu 0x050104

# ./zencli smu 0x150104

# ./zencli smu 0x250104

# ./zencli smu 0x350104

# ./zencli smu 0x450104

# ./zencli smu 0x550104

# ./zencli smu 0x650104

# ./zencli smu 0x750104
KeithMyers commented 1 year ago
./zencli smu 0x050104

[0x00050104] READ(smu) = 0xc0800083 (3229614211)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1100 0000 1000 0000 0000 0000 1000 0011

./zencli smu 0x150104
[0x00150104] READ(smu) = 0xc0800083 (3229614211)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1100 0000 1000 0000 0000 0000 1000 0011

./zencli smu 0x250104
[0x00250104] READ(smu) = 0xffffffff (4294967295)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111

./zencli smu 0x350104
[0x00350104] READ(smu) = 0xffffffff (4294967295)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111

./zencli smu 0x450104
[0x00450104] READ(smu) = 0xffffffff (4294967295)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111

./zencli smu 0x550104
[0x00550104] READ(smu) = 0xffffffff (4294967295)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111

/zencli smu 0x650104
[0x00650104] READ(smu) = 0xffffffff (4294967295)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111

./zencli smu 0x750104
[0x00750104] READ(smu) = 0xffffffff (4294967295)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111
cyring commented 1 year ago

@KeithMyers

Bit 31 of 0x050104 and 0x150104 is set, we have two controller channels. The others are disabled because of 0xffffffff value

cyring commented 1 year ago

@Jon0

For your testing RFC4 is available in latest commit.

Thank you

cyring commented 1 year ago

@KeithMyers Just to be sure issue isn't linked to visual, can you export data to JSON ?

  1. Pull, build and run the latest develop branch
  2. Redirect JSON to a file
corefreq-cli -j > /tmp/corefreq.json
  1. Compress as gzip and post corefreq.json.gz file here.

Thank you

KeithMyers commented 1 year ago

OK, here you go with the json output. corefreq.json.gz

cyring commented 1 year ago

@KeithMyers

Hi, Do you still have CoreFreq errors in the kernel log ? especially CoreFreq: AMD_SMN_Read(0, b50d6c) TryLock

cyring commented 1 year ago

@KeithMyers @Jon0

I'm refactoring the UMC aggregation. Can you please give a try to this archive and post corefreq-cli -M -n -s

CoreFreq_develop.tar.gz

KeithMyers commented 1 year ago

@KeithMyers

Hi, Do you still have CoreFreq errors in the kernel log ? especially CoreFreq: AMD_SMN_Read(0, b50d6c) TryLock Yes, I do. Dec 22 12:53:47 Pipsqueek kernel: [ 294.427042] CoreFreq: AMD_SMN_Read(0, b50d6c) TryLock

cyring commented 1 year ago

@KeithMyers Hi, Do you still have CoreFreq errors in the kernel log ? especially CoreFreq: AMD_SMN_Read(0, b50d6c) TryLock Yes, I do. Dec 22 12:53:47 Pipsqueek kernel: [ 294.427042] CoreFreq: AMD_SMN_Read(0, b50d6c) TryLock

This error has put me on track of a bug which, I hope, is fixed in the last archive, above; if you can give a try?

KeithMyers commented 1 year ago

corefreq-cli -M -n -s

./corefreq-cli -M -n -s

Controller #0                                                    Disabled  

Processor [AMD Ryzen 9 7950X 16-Core Processor] |- Architecture [Zen4/Raphael] |- Vendor ID [AuthenticAMD] |- Microcode [0x0a601203] |- Signature [ AF_61] |- Stepping [ 2] |- Online CPU [ 32/ 32] |- Base Clock [100.001] |- Frequency (MHz) Ratio
Min 3000.06 < 30 >
Max 4500.09 < 45 >
|- Factory [100.000] 4500 [ 45 ]
|- Performance
|- P-State
TGT 4500.09 < 45 >
|- CPPC
Min 3700.07 < 37 >
Max 1900.04 < 19 >
TGT 3700.07 < 37 >
|- Turbo Boost [ UNLOCK] XFR 5800.12 [ 58 ]
CPB 5700.11 [ 57 ]
1C 3000.06 < 30 >
|- Uncore [ LOCK]

Instruction Set Extensions
|- 3DNow!/Ext [N/N] ADX [Y] AES [Y] AVX/AVX2 [Y/Y] |- AVX512-F [Y] AVX512-DQ [Y] AVX512-IFMA [Y] AVX512-PF [N] |- AVX512-ER [N] AVX512-CD [Y] AVX512-BW [Y] AVX512-VL [Y] |- AVX512-VBMI [Y] AVX512-VBMI2 [Y] AVX512-VNNI [Y] AVX512-ALG [Y] |- AVX512-VPOP [Y] AVX512-VNNIW [N] AVX512-FMAPS [N] AVX512-VP2I [N] |- AVX512-BF16 [Y] AVX-VNNI-VEX [N] AVX-FP128 [N] AVX-FP256 [Y] |- BMI1/BMI2 [Y/Y] CLWB [Y] CLFLUSH [Y] CLFLUSH-OPT [Y] |- CLAC-STAC [Y] CMOV [Y] CMPXCHG8B [Y] CMPXCHG16B [Y] |- F16C [Y] FPU [Y] FXSR [Y] LAHF-SAHF [Y] |- MMX/Ext [Y/Y] MON/MWAITX [Y/Y] MOVBE [Y] PCLMULQDQ [Y] |- POPCNT [Y] RDRAND [Y] RDSEED [Y] RDTSCP [Y] |- SEP [Y] SHA [Y] SSE [Y] SSE2 [Y] |- SSE3 [Y] SSSE3 [Y] SSE4.1/4A [Y/Y] SSE4.2 [Y] |- SERIALIZE [N] SYSCALL [Y] RDPID [Y] UMIP [Y] |- VAES [Y] VPCLMULQDQ [Y] PREFETCH/W [Y] LZCNT [Y]

Features
|- 1 GB Pages Support 1GB-PAGES [Capable] |- 100 MHz multiplier Control 100MHzSteps [Missing] |- Advanced Configuration & Power Interface ACPI [Capable] |- Advanced Programmable Interrupt Controller APIC [Capable] |- Advanced Virtual Interrupt Controller AVIC [Capable] |- APIC Timer Invariance ARAT [Capable] |- Clear Zero Instruction CLZERO [Capable] |- Core Multi-Processing CMP Legacy [Capable] |- L1 Data Cache Context ID CNXT-ID [Missing] |- Collaborative Processor Performance Control CPPC [Capable] |- Direct Cache Access DCA [Missing] |- Debugging Extension DE [Capable] |- Debug Store & Precise Event Based Sampling DS, PEBS [Missing] |- CPL Qualified Debug Store DS-CPL [Missing] |- 64-Bit Debug Store DTES64 [Missing] |- Fast Short REP MOVSB FSRM [Capable] |- Fast-String Operation ERMS [Capable] |- Fused Multiply Add FMA | FMA4 [Capable] |- Hardware Lock Elision HLE [Missing] |- Hardware P-state control HwP [Capable] |- Instruction Based Sampling IBS [Capable] |- Instruction INVLPGB INVLPGB [Missing] |- Instruction INVPCID INVPCID [Capable] |- Long Mode 64 bits IA64 | LM [Capable] |- LightWeight Profiling LWP [Missing] |- Memory Bandwidth Enforcement MBE [Capable] |- Machine-Check Architecture MCA [Capable] |- Instruction MCOMMIT MCOMMIT [Missing] |- Memory Protection Extensions MPX [Missing] |- Model Specific Registers MSR [Capable] |- Memory Type Range Registers MTRR [Capable] |- No-Execute Page Protection NX [Capable] |- OS-Enabled Ext. State Management OSXSAVE [Capable] |- Physical Address Extension PAE [Capable] |- Page Attribute Table PAT [Capable] |- Pending Break Enable PBE [Missing] |- Process Context Identifiers PCID [Missing] |- Perfmon and Debug Capability PDCM [Missing] |- Page Global Enable PGE [Capable] |- Page Size Extension PSE [Capable] |- 36-bit Page Size Extension PSE36 [Capable] |- Processor Serial Number PSN [Missing] |- Resource Director Technology/PQE RDT-A [Capable] |- Resource Director Technology/PQM RDT-M [Capable] |- Read Processor Register at User level RDPRU [Capable] |- Restricted Transactional Memory RTM [Missing] |- Safer Mode Extensions SMX [Missing] |- Self-Snoop SS [Missing] |- Supervisor-Mode Access Prevention SMAP [Capable] |- Supervisor-Mode Execution Prevention SMEP [Capable] |- Time Stamp Counter TSC [Invariant] |- Time Stamp Counter Deadline TSC-DEADLINE [Missing] |- TSX Force Abort MSR Register TSX-ABORT [Missing] |- TSX Suspend Load Address Tracking TSX-LDTRK [Missing] |- User-Mode Instruction Prevention UMIP [Capable] |- Virtual Mode Extension VME [Capable] |- Virtual Machine Extensions VMX [Missing] |- Write Back & Do Not Invalidate Cache WBNOINVD [Capable] |- Extended xAPIC Support x2APIC [ xAPIC] |- AVIC controller for x2APIC x2AVIC [Capable] |- XSAVE/XSTOR States XSAVE [Capable] |- xTPR Update Control xTPR [Missing] Mitigation mechanisms
|- Indirect Branch Restricted Speculation IBRS [Capable] |- IBRS Always-On preferred by processor [ Unable] |- IBRS preferred over software solution [Capable] |- IBRS provides same speculation limits [Capable] |- Indirect Branch Prediction Barrier IBPB [Capable] |- Single Thread Indirect Branch Predictor STIBP [ Enable] |- Speculative Store Bypass Disable SSBD [Capable] |- SSBD use VIRT_SPEC_CTRL register [ Unable] |- SSBD not needed on this processor [ Unable] |- No Branch Type Confusion BTC_NO [ Unable] |- BTC on Non-Branch instruction BTC-NOBR [Capable] |- Arch - No Fast Predictive Store Forwarding PSFD [Capable] Security Features
|- Secure Init and Jump with Attestation SKINIT [Capable] |- Secure Encrypted Virtualization SEV [Missing] |- SEV - Encrypted State SEV-ES [Missing] |- SEV - Secure Nested Paging SEV-SNP [Missing] |- Guest Mode Execute Trap GMET [Capable] |- Supervisor Shadow Stack SSS [Capable] |- VM Permission Levels VMPL [Missing] |- VMPL Supervisor Shadow Stack VMPL-SSS [Missing] |- Secure Multi-Key Memory Encryption SME-MK [Missing]

Technologies
|- Instruction Cache Unit
|- L1 IP Prefetcher L1 HW IP < ON> |- Data Cache Unit
|- L1 Prefetcher L1 HW < ON> |- L2 Prefetcher L2 HW < ON> |- System Management Mode SMM-Lock [ ON] |- Simultaneous Multithreading SMT [ ON] |- PowerNow! CnQ [ ON] |- Core C-States CCx [ ON] |- Core Performance Boost CPB < ON> |- Watchdog Timer WDT |- Virtualization SVM [OFF] |- I/O MMU AMD-V [ ON] |- Version [ 0.1] |- Hypervisor [OFF] |- Vendor ID [ N/A]

Performance Monitoring
|- Version PM [ 2] |- Counters: General Fixed
| { 6, 6, 16 } x 48 bits 3 x 64 bits
|- Enhanced Halt State C1E |- C2 UnDemotion C2U < ON> |- C3 UnDemotion C3U < ON> |- Core C6 State CC6 < ON> |- Package C6 State PC6 < ON> |- Legacy Frequency ID control FID [OFF] |- Legacy Voltage ID control VID [OFF] |- P-State Hardware Coordination Feedback MPERF/APERF [ ON] |- Core C-States
|- C-States Base Address BAR [ 0x413 ] |- ACPI Processor C-States _CST [ 3] |- MONITOR/MWAIT
|- State index: #0 #1 #2 #3 #4 #5 #6 #7
|- Sub C-State: 1 1 0 0 0 0 0 0
|- Core Cycles [Capable] |- Instructions Retired [Capable] |- Reference Cycles [Capable] |- Last Level Cache References [Capable] |- Global Time Stamp Counter [Missing] |- Data Fabric Performance Counter [Capable] |- Core Performance Counter [Capable] |- Processor Performance Control _PCT [ Enable] |- Performance Supported States _PSS [ 2] |- Performance Present Capabilities _PPC [ 0] |- Continuous Performance Control _CPC [Missing] |- Collaborative Processor Performance Control CPPC < ON> |- Capabilities Lowest Efficient Guaranteed Highest
|- CPU #0 300.01 ( 3) 1900.04 ( 19) 2800.06 ( 28) 4600.09 ( 46)
|- CPU #1 300.01 ( 3) 1900.04 ( 19) 2800.06 ( 28) 5300.11 ( 53)
|- CPU #2 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4800.05 ( 48)
|- CPU #3 300.01 ( 3) 1900.03 ( 19) 2800.05 ( 28) 5200.09 ( 52)
|- CPU #4 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4700.05 ( 47)
|- CPU #5 300.00 ( 3) 1900.01 ( 19) 2800.01 ( 28) 5100.02 ( 51)
|- CPU #6 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4900.05 ( 49)
|- CPU #7 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 5300.05 ( 53)
|- CPU #8 300.00 ( 3) 1900.00 ( 19) 2800.00 ( 28) 3700.00 ( 37)
|- CPU #9 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4300.05 ( 43)
|- CPU #10 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4000.04 ( 40)
|- CPU #11 300.00 ( 3) 1900.03 ( 19) 2800.04 ( 28) 4200.07 ( 42)
|- CPU #12 300.00 ( 3) 1900.00 ( 19) 2800.00 ( 28) 3900.00 ( 39)
|- CPU #13 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4500.05 ( 45)
|- CPU #14 300.00 ( 3) 1900.00 ( 19) 2800.00 ( 28) 3800.00 ( 38)
|- CPU #15 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4400.04 ( 44)
|- CPU #16 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4600.05 ( 46)
|- CPU #17 300.00 ( 3) 1900.00 ( 19) 2800.01 ( 28) 5300.01 ( 53)
|- CPU #18 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4800.05 ( 48)
|- CPU #19 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 5200.06 ( 52)
|- CPU #20 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4700.05 ( 47)
|- CPU #21 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 5100.06 ( 51)
|- CPU #22 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4900.05 ( 49)
|- CPU #23 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 5300.06 ( 53)
|- CPU #24 300.00 ( 3) 1900.03 ( 19) 2800.05 ( 28) 3700.06 ( 37)
|- CPU #25 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4300.05 ( 43)
|- CPU #26 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4000.04 ( 40)
|- CPU #27 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4200.05 ( 42)
|- CPU #28 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 3900.04 ( 39)
|- CPU #29 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4500.05 ( 45)
|- CPU #30 300.00 ( 3) 1900.00 ( 19) 2800.00 ( 28) 3800.00 ( 38)
|- CPU #31 300.00 ( 3) 1900.02 ( 19) 2800.03 ( 28) 4400.05 ( 44)

Power, Current & Thermal
|- Temperature Offset:Junction TjMax [ 49: 0 C] |- CPPC Energy Preference EPP < 0> |- Digital Thermal Sensor DTS [Capable] |- Power Limit Notification PLN [Missing] |- Package Thermal Management PTM [Missing] |- Thermal Monitor 1 TTP [ Enable] |- Thermal Monitor 2 HTC [ Enable] |- Thermal Design Power TDP [Missing] |- Minimum Power Min [Missing] |- Maximum Power Max [Missing] |- Thermal Design Power Package [Disable] |- Power Limit PL1 [ 0 W] |- Time Window TW1 [ 0 ns] |- Power Limit PL2 [ 0 W] |- Time Window TW2 [ 0 ns] |- Thermal Design Power Core [Disable] |- Power Limit PL1 [ 0 W] |- Time Window TW1 [ 0 ns] |- Thermal Design Power Uncore [Disable] |- Power Limit PL1 [ 0 W] |- Time Window TW1 [ 0 ns] |- Thermal Design Power DRAM [Disable] |- Power Limit PL1 [ 0 W] |- Time Window TW1 [ 0 ns] |- Thermal Design Power Platform [Disable] |- Power Limit PL1 [ 0 W] |- Time Window TW1 [ 0 ns] |- Power Limit PL2 [ 0 W] |- Time Window TW2 [ 0 ns] |- Package Power Tracking PPT [Missing] |- Electrical Design Current EDC [Missing] |- Thermal Design Current TDC [Missing] |- Core Thermal Point
|- Package Thermal Point
|- Thermal Monitor Trip Limit [ 115 C] |- HTC Temperature Limit Limit [ 127 C] |- HTC Temperature Hysteresis Threshold [ 2 C] |- Units
|- Power watt [ Missing] |- Energy joule [ 0.000015259] |- Window second [ 0.000976562]

KeithMyers commented 1 year ago

Doesn't seem to have fixed it.

Dec 22 20:00:57 Pipsqueek kernel: [25924.481858] CoreFreq: AMD_SMN_Read(0, b50104) TryLock
cyring commented 1 year ago

Doesn't seem to have fixed it. Dec 22 20:00:57 Pipsqueek kernel: [25924.481858] CoreFreq: AMD_SMN_Read(0, b50104) TryLock

Within JSON, I noticed amd-pstate which is also doing SMN calls. Surely a SMU conflict between drivers. Can you unload, blacklist it, also k10temp; and post UMC and log errors?

EDIT: if it makes things easier, I can provide an ISO image with prerequisites to run CoreFreq ?

cyring commented 1 year ago

@Jon0

 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     32768      1024           4096  CMK32GX5M2B6000C40
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     32768      1024           4096  CMK32GX5M2B6000C40

I'm trying to solve the DIMM geometry from above.

Can you please read the following registers ?

## Zen1, Zen2, Zen3

./zencli smu 0x50030

./zencli smu 0x50034

./zencli smu 0x50038

./zencli smu 0x5003c

./zencli smu 0x50080

./zencli smu 0x50084

## Zen3+ [Rembrandt]

./zencli smu 0x50040

./zencli smu 0x50044

./zencli smu 0x50048

./zencli smu 0x5004c

./zencli smu 0x50090

./zencli smu 0x50094
Jon0 commented 1 year ago

Hi, i'll be back home to test anything in about a week

cyring commented 1 year ago

@KeithMyers

I have prepared the archlinux-corefreq-dev.iso (870M) that you can boot from USB.

It will auto build and start CoreFreq inside latest Arch Linux kernel.

Please let me know if you can get the Memory Controller from it ?

KeithMyers commented 1 year ago

Doesn't seem to have fixed it. Dec 22 20:00:57 Pipsqueek kernel: [25924.481858] CoreFreq: AMD_SMN_Read(0, b50104) TryLock

Within JSON, I noticed amd-pstate which is also doing SMN calls. Surely a SMU conflict between drivers. Can you unload, blacklist it, also k10temp; and post UMC and log errors?

EDIT: if it makes things easier, I can provide an ISO image with prerequisites to run CoreFreq ?

No sure what you mean by ISO image. I have tried every way I could find of disabling amd-pstate but am unable to return to acpi_freq scaling governor. It seems to be locked on this mobo. There is no CPPC enable/disable in this BIOS like there is on my C7H mobos.

cyring commented 1 year ago

Doesn't seem to have fixed it. Dec 22 20:00:57 Pipsqueek kernel: [25924.481858] CoreFreq: AMD_SMN_Read(0, b50104) TryLock

Within JSON, I noticed amd-pstate which is also doing SMN calls. Surely a SMU conflict between drivers. Can you unload, blacklist it, also k10temp; and post UMC and log errors? EDIT: if it makes things easier, I can provide an ISO image with prerequisites to run CoreFreq ?

No sure what you mean by ISO image. I have tried every way I could find of disabling amd-pstate but am unable to return to acpi_freq scaling governor. It seems to be locked on this mobo. There is no CPPC enable/disable in this BIOS like there is on my C7H mobos.

ISO is made without amd_pstate

You can also blacklist it with:

initcall_blacklist=amd_pstate_init
amd_pstate.shared_mem=0
initcall_blacklist=acpi_cpufreq_init

To my concern, no driver at all, amd_pstate and acpi_cpufreq, leads to the best performances.

KeithMyers commented 1 year ago

OK, I get it. A complete Arch boot image. I am downloading on my slow connection now. Will give that a try to help you troubleshoot my hardware.

KeithMyers commented 1 year ago

I tried the initcall blacklist=amd_pstate_init in the grub command line like I found online. Did not remove amd_pstate scaling governor.

I did not have the amd_pstate.shared_mem=0 statement though.

cyring commented 1 year ago

I tried the initcall blacklist=amd_pstate_init in the grub command line like I found online. Did not remove amd_pstate scaling governor.

I did not have the amd_pstate.shared_mem=0 statement though.

What's your Linux distribution ? Can't you build a kernel without the AMD driver in menuconfig ?

KeithMyers commented 1 year ago

Ubuntu 22.04.1 LTS is the distro.

I've never attempted to build a kernel before. Would not know how unless I found some guide somewhere of exactly what is needed.

cyring commented 1 year ago

OK, I get it. A complete Arch boot image. I am downloading on my slow connection now. Will give that a try to help you troubleshoot my hardware.

Hello,

Did you have a chance to run the ISO ? Usage is documented in the Wiki > LiveCD

If you are getting the UMC, a screenshot is welcomed.

KeithMyers commented 1 year ago

Yes, your Arch ISO image shows the memory controller. I couldn't figure out a way to run corefreq-cli standalone to get a copy of the memory controller output.

It boots directly into corefreq-cli. Once I exited the program all I could do was look at the bin dir which had the scripts in it. I couldn't find the actual corefreq-cli program to launch it. I got no desktop only command terminal. I saw no Wiki > LiveCD

How do you get a screenshot from the program?

cyring commented 1 year ago

Yes, your Arch ISO image shows the memory controller. I couldn't figure out a way to run corefreq-cli standalone to get a copy of the memory controller output.

It boots directly into corefreq-cli. Once I exited the program all I could do was look at the bin dir which had the scripts in it. I couldn't find the actual corefreq-cli program to launch it. I got no desktop only command terminal. I saw no Wiki > LiveCD

Great news to read that UMC is showing up.

By screenshot, I meant using a Camera or Smartphone to take a picture of the LCD

Wiki is here, in the Github: https://github.com/cyring/CoreFreq/wiki/Live-CD

Endly, it's interesting to understand that Ubuntu LTS does not play well with CoreFreq

KeithMyers commented 1 year ago

I took a picture with my phone of the memory controller. corefreq memory controller I wonder if it is because of the kernel version. You need a 6.0 kernel to get core-freq to correctly show the memory controller?

cyring commented 1 year ago

I took a picture with my phone of the memory controller. ...

Thank you very much, I feel relief to see it working as expected

Computed DIMM size is still an issue on Zen4

I wonder if it is because of the kernel version. You need a 6.0 kernel to get core-freq to correctly show the memory controller?

I bet with Ubuntu but not with Arch Linux where kernel v5 down to v3 in bare-metal should allows CoreFreq to drive directly the hardware.

I'm now debugging code using your trace and find a few reasons of the locking failure where callflow could have ended in this last function: https://github.com/cyring/CoreFreq/blob/0f26342b00b8c11c27e3e6fd0e7266b0c0c56167/corefreqk.h#L700

  1. Does your kernel log show the below message ?

    Unsupported AMD DF/PCI configuration found
  2. Are you running CoreFreq in Experimental mode ?

EDIT

  1. With Ubuntu LTS, how did get the amd_pstate ? Because I don't find it in the Jammy repository.
KeithMyers commented 1 year ago

No I don't have that message in the kernel log.

I do always get these messages. I don't know their importance.

Dec 23 18:45:49 Pipsqueek kernel: [    0.291364] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20210730/dswload2-162)
Dec 23 18:45:49 Pipsqueek kernel: [    0.291368] ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)

I don't know what you mean by Experimental mode?

KeithMyers commented 1 year ago

I don't know how I got the amd_pstate.

I was always running acpi_freq before on the 5950X and C7H board.

I assume I picked up when I changed cpus and mobos.

Unless it got picked up by my one-time experiment with the 6.09 kernel through the mainline ppa app.

Which I reverted when I saw it made no difference in sensors appearing.

cyring commented 1 year ago

No I don't have that message in the kernel log.

I do always get these messages. I don't know their importance.

Dec 23 18:45:49 Pipsqueek kernel: [    0.291364] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20210730/dswload2-162)
Dec 23 18:45:49 Pipsqueek kernel: [    0.291368] ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20210730/psobject-220)

Many possible issues, most of the time BIOS ACPI implementation https://elixir.bootlin.com/linux/latest/ident/AE_NOT_FOUND

I don't know what you mean by Experimental mode?

Go to the Settings menu (shortcout s) to see if Experimental is green or not.

2022-12-24-043157_283x406_scrot

cyring commented 1 year ago

@KeithMyers Hello,

It is to solve the geometry to compute the DIMM size.

Using zencli can you please post the following registers ?

## Zen1, Zen2, Zen3
### [DRAM Address Configuration]

./zencli smu 0x50030

./zencli smu 0x50034

./zencli smu 0x50038

./zencli smu 0x5003c

### [DIMM Configuration]

./zencli smu 0x50080

./zencli smu 0x50084

## Zen3+ [Rembrandt] / Zen4 ?
### [DRAM Address Configuration]

./zencli smu 0x50040

./zencli smu 0x50044

./zencli smu 0x50048

./zencli smu 0x5004c

### [DIMM Configuration]

./zencli smu 0x50090

./zencli smu 0x50094
KeithMyers commented 1 year ago

OK after dinner.

KeithMyers commented 1 year ago
./zencli smu 0x50030
[0x00050030] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x50034
[0x00050034] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x50038
[0x00050038] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x5003c
[0x0005003c] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x50080
[0x00050080] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x50084
[0x00050084] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x50040
[0x00050040] READ(smu) = 0x00150508 (1377544)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0000 0101 0000 1000

./zencli smu 0x50044
[0x00050044] READ(smu) = 0x00150508 (1377544)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0000 0101 0000 1000

./zencli smu 0x50048
[0x00050048] READ(smu) = 0x0025060c (2426380)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0101 0000 0110 0000 1100

./zencli smu 0x5004c
[0x0005004c] READ(smu) = 0x00150508 (1377544)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0101 0000 0101 0000 1000

./zencli smu 0x50090
[0x00050090] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

./zencli smu 0x50094
[0x00050094] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
cyring commented 1 year ago
./zencli smu 0x50048
[0x00050048] READ(smu) = 0x0025060c (2426380)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0101 0000 0110 0000 1100

This register could make the difference with Raphael when decoding the DIMM geometry and thus the computed DIMM size.

For your testings, code change is available in the develop branch

KeithMyers commented 1 year ago

So how can I test this? The memory controller is disabled in my distro. Have you built another Arch distro for me to test so I can boot that?

cyring commented 1 year ago

So how can I test this? The memory controller is disabled in my distro. Have you built another Arch distro for me to test so I can boot that?

Yes I have refreshed the development ISO for your testings.

However does a simple driver removing is working on your setup ?

modprobe -r amd_pstate
KeithMyers commented 1 year ago

So how can I test this? The memory controller is disabled in my distro. Have you built another Arch distro for me to test so I can boot that?

Yes I have refreshed the development ISO for your testings.

However does a simple driver removing is working on your setup ?

modprobe -r amd_pstate

No because amd_pstate is built into the kernel. It is not a module

sudo modprobe -r amd_pstate [sudo] password for keith: modprobe: FATAL: Module amd_pstate is builtin.

So is the previous link to the Arch ISO image the same for getting the updated memory controller code?

cyring commented 1 year ago

So how can I test this? The memory controller is disabled in my distro. Have you built another Arch distro for me to test so I can boot that?

Yes I have refreshed the development ISO for your testings. However does a simple driver removing is working on your setup ?

modprobe -r amd_pstate

No because amd_pstate is built into the kernel. It is not a module

sudo modprobe -r amd_pstate [sudo] password for keith: modprobe: FATAL: Module amd_pstate is builtin.

So is the previous link to the Arch ISO image the same for getting the updated memory controller code?

Yes same link. (I keep overwriting to the same ISO.)

KeithMyers commented 1 year ago

Your latest ISO image code changes still isn't showing the correct memory DIMM size. corefreq_mc

cyring commented 1 year ago

No because amd_pstate is built into the kernel. It is not a module

It would make easier if you could prevent amd_pstate from loading.

Here's my boot command line:

initrd=\EFI\Linux\amd-ucode.img initrd=\EFI\Linux\initramfs-linux.img root=/dev/disk/by-label/root rw quiet break=n add_efi_memmap nmi_watchdog=0 selinux=0 loglevel=3 rd.systemd.show_status=auto rd.udev.log-priority=3 consoleblank=0 vt.color=0x03 modprobe.blacklist=pcspkr,k10temp,sp5100_tco,acpi_cpufreq,eeepc_wmi,mxm_wmi,wmi_bmof,asus_wmi,wmi amd_pstate.shared_mem=0 idle=halt cpu0_hotplug audit=0 nowatchdog mitigations=off nokaslr sysrq_always_enabled msr.allow_writes=on amdgpu.ppfeaturemask=0xffffffff retbleed=off spectre_v2=off

As a result when starting CoreFreq all drivers are free to use, as shown Missing in Kernel window [k] 2022-12-25-142452_644x550_scrot

Optionally you can register successively to them, in the bottom-to-top order, from Clock Source to CPU-IDLE in Settings window [s] 2022-12-25-142750_644x550_scrot

Finally, you open or switch [TAB] to Kernel window and select corefreq_tsc as the system Clock Source. 2022-12-25-143040_644x550_scrot Check drivers and governor are now managed by CoreFreq.

If it succeeds then you can pull the develop branch where I have brought today a new change to the DIMM address.

KeithMyers commented 1 year ago

I've been curious about the green bolded values in Settings. It looks like they may be a toggle of some sort but I had no luck in effecting any change in any of the values. So I must not know the secret handshake or something.

cyring commented 1 year ago

I've been curious about the green bolded values in Settings. It looks like they may be a toggle of some sort but I had no luck in effecting any change in any of the values. So I must not know the secret handshake or something.

Can I see one screenshot with these 2 windows: Settings and Kernel ?

KeithMyers commented 1 year ago

I don't know how to get both Settings and Kernel data in the same terminal. So here is two screenshots instead.

image

image

Still no memory controller working.

cyring commented 1 year ago

Received this screenshot which confirms DIMM geometry at address offset 0x44, rather than 0x40 (pre Zen v4)

Available since commit 4127fa5e638d8292418487e06527b3e2de510d05

CoreFreq 7950X UMC

cyring commented 1 year ago

I don't know how to get both Settings and Kernel data in the same terminal. So here is two screenshots instead.

Just resize Terminal, CoreFreq is aware of new dimension

Still no memory controller working.

I have refreshed the development ISO for testings. I hope you'll get a similar result than this post. What I miss are other Zen4 than 7950X to confirm decoding.

KeithMyers commented 1 year ago

Where do I get the development version of the ISO? I realized too late that the link I used last time was the stock ISO image as it didn't have the development naming scheme. [Edit] OK, I found the dev version of the link at the bottom of the page. Downloading the latest commit.

cyring commented 1 year ago

Where do I get the development version of the ISO? I realized too late that the link I used last time was the stock ISO image as it didn't have the development naming scheme. [Edit] OK, I found the dev version of the link at the bottom of the page. Downloading the latest commit.

Both version links are available in bottom page of www.cyring.fr

KeithMyers commented 1 year ago

Well I finally figured out how to disable amd_pstate from the kernel commandline. I am now running acpi_cpufreq scaling governor. But it still does not enable the memory controller using your latest dev commit using the 0x40 memory address register.

image

KeithMyers commented 1 year ago

Congratz on figuring out the correct memory address for showing the proper DIMM size. Here is a screenshot from the dev ISO. corefreq_mc_2