Closed cyring closed 3 years ago
removed
Thank you for your testings: please post the CoreFreq driver output part from the kernel log.
This crashed my system immediately upon attempting to insert the kernel module: sudo insmod corefreqk.ko
I tried three times, with and without Experimental=1
, and once again after redownloading the source code in case it had gotten corrupted.
This crashed my system immediately upon attempting to insert the kernel module:
sudo insmod corefreqk.ko
I tried three times, with and without
Experimental=1
, and once again after redownloading the source code in case it had gotten corrupted.
Thanks for trying. Sorry for this crash.
My code is based on the amd64_edac.c
driver in which I have found some registers which to my known are undocumented by AMD.
The SMU is probably overwhelmed if k10temp
and/or lm_sensors
are also running in the same time of CoreFreq. In that case, you will have to build this way:
make HWM_CHIPSET=COMPATIBLE clean all
COMPATIBLE
will be confirmed by this pragma Building with Kernel amd_smn_read()
during the build.
CoreFreq_develop.tar.gz
, above , is updated with a minor assembly code improvement.No change. I tried make HWM_CHIPSET=COMPATIBLE clean all
, and I also tried the same after removing k10temp
and asus_wmi_sensors
with rmmod
.
Btw, by crash, I mean freeze. The display freezes, mouse or keyboard inputs don't work, and the system fails to respond to pings.
No change. I tried
make HWM_CHIPSET=COMPATIBLE clean all
, and I also tried the same after removingk10temp
andasus_wmi_sensors
withrmmod
.Btw, by crash, I mean freeze. The display freezes, mouse or keyboard inputs don't work, and the system fails to respond to pings.
Found one bug within the asm code compiled by gas
Now I'm struggling to make a SMU locking mechanism...
Locking added to the alpha version 2020-07-16 17:35
Thank you for your testings.
Unfortunately the latest version too froze the system. I used make HWM_CHIPSET=COMPATIBLE clean all
again.
Unfortunately the latest version too froze the system. I used
make HWM_CHIPSET=COMPATIBLE clean all
again.
Thanks for this test.
Something I don't understand within the amd64_edac.c source code is that this driver makes no difference among the Family 17h (Zen) generations. Although this driver is managing the ECC and will stop loading on your 2700X as well my 3950X, because ECC feature is missing, it however unveils registers which appear common to the whole Architecture.
Those are the registers I'm using to compute the DIMM(s) size using I/O accesses to the FCH which has been recently documented in the PPR for Family 17h models 60h rev. A1 processors. These models are APU(s) but it works with my Zen2 !
It's hard to tell why it's freezing on your Processor and I would have to run step by step, Register after Register, to find what it's allowed or not. Bug can also be part of my decoding code. Although I'm programming this asm part with only common ISA instructions. I'm thinking to provide a simple Cli to query the FCH in user-space ...
Hello
Here is zencli.c
cc -g zencli.c -o zencli
sudo ./zencli smu 0x50030
0x150508 (1377544)
sudo ./zencli smu 0x50100
0x80000200 (2147484160)
sudo ./zencli smu 0x50104
0xb040808b (2957017227)
sudo ./zencli smu 0x50df0
0x10030 (65584)
Thank you
zencli
was a bit more successful:
sudo ./zencli smu 0x50030
0x150508 (1377544)
sudo ./zencli smu 0x50100
0x80000200 (2147484160)
sudo ./zencli smu 0x50104
0xb0408082 (2957017218)
sudo ./zencli smu 0x50df0
0x1fe2c (130604)
zencli
was a bit more successful:
Great. Let's go further. Please download and build the latest zencli.c
then read the UMC as below
sudo ./zencli umc 0x0
Welcome to the Data Fabric: UMC has 2 x Channel(s)
CHA[0] CHIP_BAR[0][0]=0x00050000 CHIP_BAR[0][1]=0x00050020
CHIP_BAR[1][0]=0x00050010 CHIP_BAR[1][1]=0x00050028
CHA[0] CHIP[0:0] @ 0x00050000[0x00000000] Disable
CHA[0] MASK[0:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[0:1] @ 0x00050010[0x00000000] Disable
CHA[0] MASK[0:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[1:0] @ 0x00050004[0x00000000] Disable
CHA[0] MASK[1:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[1:1] @ 0x00050014[0x00000000] Disable
CHA[0] MASK[1:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[2:0] @ 0x00050008[0x00000001] Enable
CHA[0] MASK[2:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[2:1] @ 0x00050018[0x00000000] Disable
CHA[0] MASK[2:1] @ 0x0005002c[0x00000000]
CHA[0] CHIP[3:0] @ 0x0005000c[0x00000201] Enable
CHA[0] MASK[3:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[3:1] @ 0x0005001c[0x00000000] Disable
CHA[0] MASK[3:1] @ 0x0005002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
CHA[1] CHIP_BAR[0][0]=0x00150000 CHIP_BAR[0][1]=0x00150020
CHIP_BAR[1][0]=0x00150010 CHIP_BAR[1][1]=0x00150028
CHA[1] CHIP[0:0] @ 0x00150000[0x00000000] Disable
CHA[1] MASK[0:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[0:1] @ 0x00150010[0x00000000] Disable
CHA[1] MASK[0:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[1:0] @ 0x00150004[0x00000000] Disable
CHA[1] MASK[1:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[1:1] @ 0x00150014[0x00000000] Disable
CHA[1] MASK[1:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[2:0] @ 0x00150008[0x00000001] Enable
CHA[1] MASK[2:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[2:1] @ 0x00150018[0x00000000] Disable
CHA[1] MASK[2:1] @ 0x0015002c[0x00000000]
CHA[1] CHIP[3:0] @ 0x0015000c[0x00000201] Enable
CHA[1] MASK[3:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[3:1] @ 0x0015001c[0x00000000] Disable
CHA[1] MASK[3:1] @ 0x0015002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
sudo ./zencli umc 0x0
Welcome to the Data Fabric: UMC has 2 x Channel(s)
CHA[0] CHIP_BAR[0][0]=0x00050000 CHIP_BAR[0][1]=0x00050020
CHIP_BAR[1][0]=0x00050010 CHIP_BAR[1][1]=0x00050028
CHA[0] CHIP[0:0] @ 0x00050000[0x00000000] Disable
CHA[0] MASK[0:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[0:1] @ 0x00050010[0x00000000] Disable
CHA[0] MASK[0:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[1:0] @ 0x00050004[0x00000000] Disable
CHA[0] MASK[1:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[1:1] @ 0x00050014[0x00000000] Disable
CHA[0] MASK[1:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[2:0] @ 0x00050008[0x00000001] Enable
CHA[0] MASK[2:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[2:1] @ 0x00050018[0x00000000] Disable
CHA[0] MASK[2:1] @ 0x0005002c[0x00000000]
CHA[0] CHIP[3:0] @ 0x0005000c[0x00000201] Enable
CHA[0] MASK[3:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[3:1] @ 0x0005001c[0x00000000] Disable
CHA[0] MASK[3:1] @ 0x0005002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
CHA[1] CHIP_BAR[0][0]=0x00150000 CHIP_BAR[0][1]=0x00150020
CHIP_BAR[1][0]=0x00150010 CHIP_BAR[1][1]=0x00150028
CHA[1] CHIP[0:0] @ 0x00150000[0x00000000] Disable
CHA[1] MASK[0:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[0:1] @ 0x00150010[0x00000000] Disable
CHA[1] MASK[0:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[1:0] @ 0x00150004[0x00000000] Disable
CHA[1] MASK[1:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[1:1] @ 0x00150014[0x00000000] Disable
CHA[1] MASK[1:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[2:0] @ 0x00150008[0x00000001] Enable
CHA[1] MASK[2:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[2:1] @ 0x00150018[0x00000000] Disable
CHA[1] MASK[2:1] @ 0x0015002c[0x00000000]
CHA[1] CHIP[3:0] @ 0x0015000c[0x00000201] Enable
CHA[1] MASK[3:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[3:1] @ 0x0015001c[0x00000000] Disable
CHA[1] MASK[3:1] @ 0x0015002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
So I'm lost ! It is barely the same UMC code as the driver one.
In this version of CoreFreq the whole UMC code is commented, just to check if the issue comes from somewhere else ?
Be prepared for a crash
removed
Still crashing.
Yes, I didn't do a diff
, but my UMC code output looks identical!
May be you already have a crash with develop
commit 790ce5f1dd423c1e0cc2d363dd762bd84b8fc678
... where I've added 2 Mitigation MSR.
Their availability might be a function of the firmware.
To avoid a crash, please test as root those registers:
modprobe msr
rdmsr -aX 0x00000048
0
0
...
0
rdmsr -aX 0x00000049
rdmsr: CPU 0 cannot read MSR 0x00000049
modprobe msr
rdmsr -aX 0x00000048
rdmsr: CPU 0 cannot read MSR 0x00000048
rdmsr -aX 0x00000049
rdmsr: CPU 0 cannot read MSR 0x00000049
modprobe msr rdmsr -aX 0x00000048 rdmsr: CPU 0 cannot read MSR 0x00000048 rdmsr -aX 0x00000049 rdmsr: CPU 0 cannot read MSR 0x00000049
Here we are ! SPEC_CTRL (0x00000048)
can not be read
Can you rollback to CoreFreq master
then dump the CPUID.
The output for CPU #0
will be enough
corefreq-cli -u
For curiosity (before your reply) I did confirm that commit 790ce5f (1.79-33-g790ce5f
) (develop branch) did cause the crash too.
With 1.79-23-gbefddf5
(master branch)
$ corefreq-cli -u
CPU #0 function EAX EBX ECX EDX
|- 00000000:00000000 0000000d 68747541 444d4163 69746e65
|- Largest Standard Function=0000000d
|- 80000000:00000000 8000001f 68747541 444d4163 69746e65
|- Largest Extended Function=8000001f
|- 00000001:00000000 00800f82 00100800 7ed8320b 178bfbff
|- 00000002:00000000 00000000 00000000 00000000 00000000
|- 00000003:00000000 00000000 00000000 00000000 00000000
|- 00000004:00000000 00000000 00000000 00000000 00000000
|- 00000004:00000001 00000000 00000000 00000000 00000000
|- 00000004:00000002 00000000 00000000 00000000 00000000
|- 00000004:00000003 00000000 00000000 00000000 00000000
|- 00000005:00000000 00000040 00000040 00000003 00000011
|- 00000006:00000000 00000004 00000000 00000001 00000000
|- 00000007:00000000 00000000 209c01a9 00000000 00000000
|- 00000007:00000001 00000000 00000000 00000000 00000000
|- 00000009:00000000 00000000 00000000 00000000 00000000
|- 0000000a:00000000 00000000 00000000 00000000 00000000
|- 0000000b:00000000 00000000 00000000 00000000 00000000
|- 0000000d:00000000 00000007 00000340 00000340 00000000
|- 0000000d:00000001 0000000f 00000340 00000000 00000000
|- 0000000d:00000002 00000100 00000240 00000000 00000000
|- 0000000d:00000003 00000000 00000000 00000000 00000000
|- 0000000d:00000004 00000000 00000000 00000000 00000000
|- 0000000d:0000003e 00000000 00000000 00000000 00000000
|- 0000000f:00000000 00000000 00000000 00000000 00000000
|- 0000000f:00000001 00000000 00000000 00000000 00000000
|- 00000010:00000000 00000000 00000000 00000000 00000000
|- 00000010:00000001 00000000 00000000 00000000 00000000
|- 00000010:00000002 00000000 00000000 00000000 00000000
|- 00000010:00000003 00000000 00000000 00000000 00000000
|- 00000012:00000000 00000000 00000000 00000000 00000000
|- 00000012:00000001 00000000 00000000 00000000 00000000
|- 00000012:00000002 00000000 00000000 00000000 00000000
|- 00000014:00000000 00000000 00000000 00000000 00000000
|- 00000014:00000001 00000000 00000000 00000000 00000000
|- 00000015:00000000 00000000 00000000 00000000 00000000
|- 00000016:00000000 00000000 00000000 00000000 00000000
|- 00000017:00000000 00000000 00000000 00000000 00000000
|- 00000017:00000001 00000000 00000000 00000000 00000000
|- 00000017:00000002 00000000 00000000 00000000 00000000
|- 00000017:00000003 00000000 00000000 00000000 00000000
|- 00000018:00000000 00000000 00000000 00000000 00000000
|- 00000018:00000001 00000000 00000000 00000000 00000000
|- 0000001a:00000000 00000000 00000000 00000000 00000000
|- 0000001b:00000000 00000000 00000000 00000000 00000000
|- 0000001f:00000000 00000000 00000000 00000000 00000000
|- 80000001:00000000 00800f82 20000000 35c233ff 2fd3fbff
|- 80000002:00000000 20444d41 657a7952 2037206e 30303732
|- 80000003:00000000 69452058 2d746867 65726f43 6f725020
|- 80000004:00000000 73736563 2020726f 20202020 00202020
|- 80000005:00000000 ff40ff40 ff40ff40 20080140 40040140
|- 80000006:00000000 26006400 66006400 02006140 00808140
|- 80000007:00000000 00000000 0000001b 00000000 00006799
|- 80000008:00000000 00003030 00001007 0000400f 00000000
|- 8000000a:00000000 00000001 00008000 00000000 0001bcff
|- 80000019:00000000 f040f040 00000000 00000000 00000000
|- 8000001a:00000000 00000003 00000000 00000000 00000000
|- 8000001b:00000000 000003ff 00000000 00000000 00000000
|- 8000001c:00000000 00000000 00000000 00000000 00000000
|- 8000001d:00000000 00004121 01c0003f 0000003f 00000000
|- 8000001d:00000001 00004122 00c0003f 000000ff 00000000
|- 8000001d:00000002 00004143 01c0003f 000003ff 00000002
|- 8000001d:00000003 0001c163 03c0003f 00001fff 00000001
|- 8000001e:00000000 00000000 00000100 00000000 00000000
|- 40000000:00000000 00000000 00000000 00000000 00000000
|- 40000001:00000000 00000000 00000000 00000000 00000000
|- 40000002:00000000 00000000 00000000 00000000 00000000
|- 40000003:00000000 00000000 00000000 00000000 00000000
|- 40000004:00000000 00000000 00000000 00000000 00000000
|- 40000005:00000000 00000000 00000000 00000000 00000000
|- 40000006:00000000 00000000 00000000 00000000 00000000
CPUID_Fn8000000A_EDX [SVM Revision and Feature Identification] Bits | Description |
---|---|
20 |
GuestSpecCtrl. Read-only. Reset: Fixed,1. 1=Indicates support for Guest SPEC_CTRL. |
CPU #0 function EAX EBX ECX EDX
|- 8000000a:00000000 00000001 00008000 00000000 0001bcff
0x1BCFF
= 0b000011011110011111111
CPU #0 function EAX EBX ECX EDX
|- 8000000a:00000000 00000001 00008000 00000000 0013bcff
0x13BCFF
= 0b100111011110011111111
Mitigation mechanisms
output
It worked :)
Not sure if necessary but I used make HWM_CHIPSET=COMPATIBLE clean all
.
In case you still want the driver output from the system log:
CoreFreq(4:12): Processor [ 8F_08] Architecture [Zen+ Pinnacle Ridge] SMT [16/16]
Welcome to the Data Fabric UMC(0) @ 0x00050000:
0x030[0x00150508] 0x080[0x00000000] 0x100[0x80000200]
0x104[0xb0408082] 0x14c[0x00000000]
0xdf0[0x0001fe2c] 0xdf4[0x00000000]
Welcome to the Data Fabric UMC(1) @ 0x00150000:
0x030[0x00150508] 0x080[0x00000000] 0x100[0x80000200]
0x104[0xb0408082] 0x14c[0x00000000]
0xdf0[0x0001fe2c] 0xdf4[0x00000000]
CHA[0] CHIP_BAR[0][0]=0x00050000 CHIP_BAR[0][1]=0x00050020
CHIP_BAR[1][0]=0x00050010 CHIP_BAR[1][1]=0x00050028
CHA[0] CHIP[0:0] @ 0x00050000[0x00000000] Disable
CHA[0] MASK[0:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[0:1] @ 0x00050010[0x00000000] Disable
CHA[0] MASK[0:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[1:0] @ 0x00050004[0x00000000] Disable
CHA[0] MASK[1:0] @ 0x00050020[0x00000000]
CHA[0] CHIP[1:1] @ 0x00050014[0x00000000] Disable
CHA[0] MASK[1:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[2:0] @ 0x00050008[0x00000001] Enable
CHA[0] MASK[2:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[2:1] @ 0x00050018[0x00000000] Disable
CHA[0] MASK[2:1] @ 0x0005002c[0x00000000]
CHA[0] CHIP[3:0] @ 0x0005000c[0x00000201] Enable
CHA[0] MASK[3:0] @ 0x00050024[0x03fffdfe] ChipSize[8388608]
CHA[0] CHIP[3:1] @ 0x0005001c[0x00000000] Disable
CHA[0] MASK[3:1] @ 0x0005002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
CHA[1] CHIP_BAR[0][0]=0x00150000 CHIP_BAR[0][1]=0x00150020
CHIP_BAR[1][0]=0x00150010 CHIP_BAR[1][1]=0x00150028
CHA[1] CHIP[0:0] @ 0x00150000[0x00000000] Disable
CHA[1] MASK[0:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[0:1] @ 0x00150010[0x00000000] Disable
CHA[1] MASK[0:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[1:0] @ 0x00150004[0x00000000] Disable
CHA[1] MASK[1:0] @ 0x00150020[0x00000000]
CHA[1] CHIP[1:1] @ 0x00150014[0x00000000] Disable
CHA[1] MASK[1:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[2:0] @ 0x00150008[0x00000001] Enable
CHA[1] MASK[2:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[2:1] @ 0x00150018[0x00000000] Disable
CHA[1] MASK[2:1] @ 0x0015002c[0x00000000]
CHA[1] CHIP[3:0] @ 0x0015000c[0x00000201] Enable
CHA[1] MASK[3:0] @ 0x00150024[0x03fffdfe] ChipSize[8388608]
CHA[1] CHIP[3:1] @ 0x0015001c[0x00000000] Disable
CHA[1] MASK[3:1] @ 0x0015002c[0x00000000]
Memory Size[16777216 KB] [16384 MB]
It worked :)
But I'm using the wrong Capability bits. The good ones belong to the CPUID leaf 0x80000008:EBX Can you test this version ?
removed
Not sure if necessary but I used
make HWM_CHIPSET=COMPATIBLE clean all
.
You don't have to if no other drivers are running (k10temp) Without it, CoreFreq queries directly the sensors. It will improve latency.
In case you still want the driver output from the system log:
Yes, I try to find a match between the UMC output we're getting and the DIMM location. Can you tell how your DIMM are populated on the motherboard ?
My 2 x 16 GB DIMM are slotted like this:
DIMM B1[ ] DIMM B2[X] DIMM A1[ ] DIMM A2[X]
I do have k10temp
and asus-wmi-sensors
running all the time, so I'll continue using make HWM_CHIPSET=COMPATIBLE clean all
for now.
CoreFreq Daemon 1.80.1
My 2 x 16 GB DIMM are slotted like this:
DIMM B1[ ] DIMM B2[X] DIMM A1[ ] DIMM A2[X]
Exactly the same here.
Pushing this fix to the develop
branch.
Thank you for your help.
This is where am I now.
Basically, the DIMM size and the Channel mode.
Available in develop
for your testings.
So far so good:
Nice. Thank you. More to come...
Btw, I used simple make -j
and then insmod corefreqk.ko
and it was fine.
Btw, I used simple
make -j
and theninsmod corefreqk.ko
and it was fine.
When working with develop
I recommend to fully rebuild and reload CoreFreq because I might have changed the API without updating the version. This could lead to a crash.
Thus, always build this way:
make clean all
rmmod corefreqk
insmod corefreqk.ko
Beside those AMD functionalities, the Experimental mode is not required :
I never keep corefreqk running after testing, and the daemon is also run in the foreground, not background. After each test my routine is to Ctrl-C the running corefreqd
and then rmmod corefreqk
.
I also make clean
before doing git pull
and make
. Is make clean all
equivalent to make clean
then make
?
Is
make clean all
equivalent tomake clean
thenmake
?
Yes, the same in one command.
New develop
version with the UMC timings and Speed decoded
Thank you for your test. Do all the timings and DDR speed match the BIOS settings ?
Yes, although the BIOS mentions CHA and CHB, which correspond to Cha#1 and Cha#0 from CoreFreq, respectively, based on the dissimilar values for RdWr and WrRd.
Yes, although the BIOS mentions CHA and CHB, which correspond to Cha#1 and Cha#0 from CoreFreq, respectively, based on the dissimilar values for RdWr and WrRd.
Thanks a lot for your confirmation.
It will be difficult to please all BIOS terminologies. I believe Timings are called differently among manufacturers. Except the channel id, I'm doing the same as the ASUS board, but cells are 5 characters only, space included, to name each item.
Perhaps, sticking to the DRAM terminology would be better...
My comment is not about the terminology, but the correspondence between CH A/B in BIOS and Cha 1/0 in CoreFreq.
My comment is not about the terminology, but the correspondence between CH A/B in BIOS and Cha 1/0 in CoreFreq.
Mine is 0 for cha A, 1 for B So there's an issue in the topology. I will need more tests from other brands to understand the registers encoding.
Based on JEDEC, renaming the Timings.
I went into BIOS to change timings per channel and thus identify the good row. But such selection is not possible. I will have to physically remove one DIMM ...
Thanks for the screenshot.
About the RdWr
and WrRd
timing different values per channel, can you confirm if it is a CoreFreq bug or it just reflects what it is set in BIOS ?
About the RdWr and WrRd timing different values per channel, can you confirm if it is a CoreFreq bug or it just reflects what it is set in BIOS ?
They are reflected in BIOS. I was surprised actually. Did not know they can be different, and still don't know if they should. I just have DOCP (XMP) set in BIOS and no manual memory timings.
Here are the additional timings for Zen, and some for Intel.
Some timings don't show up with my ASUS board but I believe the Gigabyte BIOS M.I.T.
should provide them (according to some Internet screenshots)
Bus and DRAM speeds are now scaled to the MEMCLK and the Base Clock.
5 characters, including a space separator, are quiet short to fully name the timings. Anachronisms are partly documented in the header file, starting at: https://github.com/cyring/CoreFreq/blob/2724fba01de756c0d241cd30e99f1586409a1f6c/amdmsr.h#L841
for example: dlr
= Different Logical Ranks
I'm adding into Roadmap a function to show some explanation in the window bottom border line when hovering cells ...
Thanks for your tests
The "ECC" reading might be wrong. I use ECC RAM here on a Ryzen 7 2700X and it says "0" in the ECC column. I looked at the latest commit in the "master" branch, not "develop".
$ corefreq-cli -M
Zen [1463]
Controller #0 Dual Channel
Bus Rate 1566 MHz Bus Speed 1566 MHz DRAM Speed 3133 MHz
Cha CL RCD_R RCD_W RP RAS RC RRD_S RRD_L FAW WTR_S WTR_L WR clRR
#0 14 17 14 15 31 50 4 4 16 4 10 10 4
#1 14 17 14 15 31 50 4 4 16 4 10 10 4
clWW CWL RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR ECC Rate
#0 2 14 6 6 3 1 6 5 1 4 3 0 1N
#1 2 14 6 6 3 1 6 5 1 4 3 0 1N
DIMM Geometry for channel #0
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
DIMM Geometry for channel #1
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
ECC is enabled and working. The kernel log has this:
[ 5.098744] EDAC amd64: Node 0: DRAM ECC enabled.
And here is an actual error from a few months ago:
$ ras-mc-ctl --errors
...
32 2020-02-10 15:54:39 +0100 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=16), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=3, mcgcap=0x00000117, status=0x9c2040000000011b, addr=0x215321700, misc=0xd01a000101000000, walltime=0x5e416eb0, cpuid=0x00800f82, bank=0x00000010
The "ECC" reading might be wrong. I use ECC RAM here on a Ryzen 7 2700X and it says "0" in the ECC column. I looked at the latest commit in the "master" branch, not "develop".
$ corefreq-cli -M Zen [1463] Controller #0 Dual Channel Bus Rate 1566 MHz Bus Speed 1566 MHz DRAM Speed 3133 MHz Cha CL RCD_R RCD_W RP RAS RC RRD_S RRD_L FAW WTR_S WTR_L WR clRR #0 14 17 14 15 31 50 4 4 16 4 10 10 4 #1 14 17 14 15 31 50 4 4 16 4 10 10 4 clWW CWL RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR ECC Rate #0 2 14 6 6 3 1 6 5 1 4 3 0 1N #1 2 14 6 6 3 1 6 5 1 4 3 0 1N DIMM Geometry for channel #0 Slot Bank Rank Rows Columns Memory Size (MB) #0 #1 2 16 65536 1024 16384 DIMM Geometry for channel #1 Slot Bank Rank Rows Columns Memory Size (MB) #0 #1 2 16 65536 1024 16384
ECC is enabled and working. The kernel log has this:
[ 5.098744] EDAC amd64: Node 0: DRAM ECC enabled.
And here is an actual error from a few months ago:
$ ras-mc-ctl --errors ... 32 2020-02-10 15:54:39 +0100 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=16), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=3, mcgcap=0x00000117, status=0x9c2040000000011b, addr=0x215321700, misc=0xd01a000101000000, walltime=0x5e416eb0, cpuid=0x00800f82, bank=0x00000010
Can you replace this bits layout in file: https://github.com/cyring/CoreFreq/blob/acf1a732565454e01623a8e5b717b868366d0638/amdmsr.h#L866
with this code:
typedef union
{ /* SMU: address = 0x50100 */
unsigned int value;
struct
{
unsigned int
ReservedBits1 : 30-0,
ECC_DIMM_Enable : 31-30,
ReservedBits2 : 32-31;
};
} AMD_17_UMC_CFG_ECC;
EDIT The testing change is available in current develop
branch.
you don't have to edit code as requested above
I get this with the 'develop' branch from right now, the ECC column is still zero:
$ corefreq-cli -M
Zen UMC [1463]
Controller #0 Dual Channel
Bus Rate 1566 MT/s Bus Speed 1566 MHz DRAM Speed 3133 MHz
Cha CL RCD_R RCD_W RP RAS RC RRD_S RRD_L FAW WTR_S WTR_L WR clRR clWW
#0 14 17 14 15 31 50 4 4 16 4 10 10 4 2
#1 14 17 14 15 31 50 4 4 16 4 10 10 4 2
CWL RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR dlr[RR WW WR RRD]
#0 14 6 6 3 1 6 5 1 4 3 0 0 0 0
#1 14 6 6 3 1 6 5 1 4 3 0 0 0 0
REFI RFC RFC2 RFC4 RCPB RPPB sFAW dFAW Ban RCPage CKE CMD GDM ECC
#0 12226 400 400 400 0 0 0 0 R1W1 0 8 1T ON 0
#1 12226 400 400 400 0 0 0 0 R1W1 0 8 1T ON 0
DIMM Geometry for channel #0
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
DIMM Geometry for channel #1
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
I get this with the 'develop' branch from right now, the ECC column is still zero:
Can you try again with the latest commit in develop
Remark: the driver API has changed, be sure to rebuild and reload all.
Thank you
It works correctly, I think. Using commit 9f68310, I get this output here, it shows ECC = 1:
Zen UMC [1463]
Controller #0 Dual Channel
Bus Rate 1566 MT/s Bus Speed 1566 MHz DRAM Speed 3133 MHz
Cha CL RCD_R RCD_W RP RAS RC RRD_S RRD_L FAW WTR_S WTR_L WR clRR clWW
#0 14 17 14 15 31 50 4 4 16 4 10 10 4 2
#1 14 17 14 15 31 50 4 4 16 4 10 10 4 2
CWL RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR dlr[RR WW WR RRD]
#0 14 6 6 3 1 6 5 1 4 3 0 0 0 0
#1 14 6 6 3 1 6 5 1 4 3 0 0 0 0
REFI RFC RFC2 RFC4 RCPB RPPB sFAW dFAW Ban RCPage CKE CMD GDM ECC
#0 12226 400 400 400 0 0 0 0 R1W1 0 8 1T ON 1
#1 12226 400 400 400 0 0 0 0 R1W1 0 8 1T ON 1
DIMM Geometry for channel #0
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
DIMM Geometry for channel #1
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
I'll now try rebooting and disabling ECC in the BIOS menus and see what happens.
EDIT: I managed to find the option to disable ECC in the BIOS menus, the output of dmesg has this:
$ dmesg | grep -i '\becc\b'
[ 4.382182] EDAC amd64: Node 0: DRAM ECC disabled.
[ 4.382889] EDAC amd64: Node 0: DRAM ECC disabled.
And I can now see a "0" in corefreq-cli:
Zen UMC [1463]
Controller #0 Dual Channel
Bus Rate 1566 MT/s Bus Speed 1566 MHz DRAM Speed 3133 MHz
Cha CL RCD_R RCD_W RP RAS RC RRD_S RRD_L FAW WTR_S WTR_L WR clRR clWW
#0 14 17 14 15 31 50 4 4 16 4 10 10 4 2
#1 14 17 14 15 31 50 4 4 16 4 10 10 4 2
CWL RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR dlr[RR WW WR RRD]
#0 14 6 6 3 1 6 5 1 4 3 0 0 0 0
#1 14 6 6 3 1 6 5 1 4 3 0 0 0 0
REFI RFC RFC2 RFC4 RCPB RPPB sFAW dFAW Ban RCPage CKE CMD GDM ECC
#0 12226 400 400 400 0 0 0 0 R1W1 0 8 1T ON 0
#1 12226 400 400 400 0 0 0 0 R1W1 0 8 1T ON 0
DIMM Geometry for channel #0
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
DIMM Geometry for channel #1
Slot Bank Rank Rows Columns Memory Size (MB)
#0
#1 2 16 65536 1024 16384
EDIT 2: Everything works great here. I enabled ECC again in the BIOS, and the output in corefreq changed back to "1".
EDIT 2: Everything works great here. I enabled ECC again in the BIOS, and the output in corefreq changed back to "1".
Thanks for this various tests.
All credits to the Linux kernel for unveiling those UMC registers. See amd64_edac.h
Development notes
2020-07-15
2020-07-14
2020-07-13
UMC Config
thus bit 9 and 31 enabled
SDP
bit 31 (
SdpInit
) in both UMC, we havetwo
channels