Closed terencode closed 4 years ago
It's hard to tell if it's HW or SW interactions but can you boot my ISO image in Wiki ? It's built with the mainstream Arch Linux kernel, no MuQSS flavor, no perf, no software CoreFreq may conflict with.
The problem is it's hard to reproduce, it only happened after some hours.
I'm not familiar with the MuQSS scheduler but I believe it is real-time based and CoreFreq does assembly bus locks (to sync threads). Especially in user-space, the Daemon is locking bus to aggregate the per-cpu data. I believe that bus locking may disturb a real-time scheduler.
In the Client, menu Settings, you can enable the NMI counters. Next, chose the view System Interrupts to monitor NMI
Then, I wonder if the mainstream scheduler (ie ISO image or one of your other boot options to go without MuQSS) will show as much as NMI counts ?
I'll do that and report with MuQSS vs CFS which is the default.
Here you go: I hope it doesn't matter how many apps are started. CFS: MuQSS:
As soon as I start perf, such as perf top
, CoreFreq counts several Local NMI
Are you running perf in the same time than CoreFreq ? because they both conflict on the PMC registers.
When I took the screenshots no. I ran it after I noticed the freeze to try to diagnose what was going on.
I see, AMD processor, no PMC involved, my above comment is relevant for Intel CPU only.
I believe that without starting CoreFreq driver, you don't encounter such NMIs ?
I don't indeed.
Do you have lm-sensors with one k10temp driver running ?
I'm using https://github.com/ocerman/zenpower
So there ...
https://github.com/ocerman/zenpower/blob/d577d3b9b445e46ffc7fa5f49c38f3e4c1ddaf0e/zenpower.c#L290
and here ...
https://github.com/cyring/CoreFreq/blob/e5f3ba5c356c9e2eae631dc876791e39928d1d6c/corefreqk.c#L6345
... may happen a SMU register usage conflict on writing periodically the offset 0x00059800 to read the TCL temperature sensor.
Apparently, for the same reason, Zenpower is asking to unload k10temp, you will have to unload the Zenpower module prior starting CoreFreq
Does it run better ?
Running modprobe -r zenpower
before probing the module:
Is there a performance cost while this module is running? It's also a bummer you can't monitor your cpu temperature...
Running
modprobe -r zenpower
before probing the module:
So you confirm the system is stable with CoreFreq only ?
Is there a performance cost while this module is running?
Minimizing the CPU overhead is my top priority.
I'm monitoring through perf top the usage of the "Cycle Functions", bound to high resolution timers and corefreqk
appears at the bottom of the list; with a default interval of one second.
The user-space parts give me a bit more work.
Although the Daemon corefreqd
has a CPU overhead between 2 to 5 times less than top, htop, the Client corefreq-cli
can request 0.5% of CPU
It's also a bummer you can't monitor your cpu temperature...
For Ryzen, I have so far implemented the only sensor which according to specs is a socket scope register. I presume that a 2 sockets setup may offer 2 sensors but I don't have any Zen processor yet to test with.
So you confirm the system is stable with CoreFreq only ?
Haven't tested long enough but it seemed fine.
For Ryzen, I have so far implemented the only sensor which according to specs is a socket scope register. I presume that a 2 sockets setup may offer 2 sensors but I don't have any Zen processor yet to test with.
Is this only obtainable when running the cli or will it register it with sensors-detect like zenpower or k10temp?
Is this only obtainable when running the cli or will it register it with sensors-detect like zenpower or k10temp?
It's a different approach: CoreFreq does not rely on other libs.
Talking about sensors-detect
means lm_sensors
which CoreFreq does not use at all.
That's why Processor registers usage conflict may happen b/c corefreqk.ko
competes with any other drivers to claim an exclusive access on Processor resources: msr, pci, and some control registers.
The drawback is that every bits of my program is written from scratch.
Ok so the problem means I can't have temperature monitoring while I'm using it then.
But you have the temperature; in your screenshot, it is written in the footer :
T[53]
You can also read it in the startup view Frequency
. It is written in column TMP
with its min and it's max.
My purpose was to say that temperature is given per Processor (and not for each Core) That's the way the sensor is specified for the Zen architecture and only one software can monitor it.
I hope it will help.
Ah thanks for the explanation, I understand better now. What I mean is I can't use my regular monitoring software.
What I mean is I can't use my regular monitoring software.
I will improve my driver compatibility with the kernel by using the function amd_smn_read which serializes the SMU access through a mutex. It should let CoreFreq and lm_sensors run simultaneously.
I'll send or post codes that you will change in sources then build and test.
Sounds good, thanks.
Hello,
Attached the version 1.67 for your tests.
k10temp
driver. lm_sensors
CoreFreq: Failed to read TctlSensor
" is traced or not in kernel logThanks, however, it'd be easier for me if you would just push it to a new branch please.
I can't install it, insmod corefreqd.ko
made a hard hang and make install
says:
- SSL error:02001002:system library:fopen:No such file or directory: crypto/bio/bss_file.c:69
- SSL error:2006D080:BIO routines:BIO_new_file:no such file: crypto/bio/bss_file.c:76
sign-file: certs/signing_key.pem: No such file or directory
Can you post the output of command :
lspci -nn
make install
is not ready for use (fyi it was a PR).make clean all
insmod corefreqk.ko
./corefreqd
./corefreq-cli
No problem ^^ Here you go:
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller [1022:43b5] (rev 02)
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset PCIe Upstream Port [1022:43b0] (rev 02)
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
02:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
02:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
02:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
03:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1143 USB 3.1 Host Controller [1b21:1343]
04:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
0a:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1470] (rev c1)
0b:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1471]
0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1)
0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
0e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
0f:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
10:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
Hello,
In issue #54 we have tested the use of the kernel SMU API It appears it works.
To build with the kernel API:
make FEAT_DBG=2 clean all
You will read messages among the build which confirm the API usage.
Next you should able to read the temperature in the same time of lm_sensors
Hey,
I tried installing it but I get this when using make module-install
:
make -C /lib/modules/5.4.1-3-tkg-bmq/build M=/mnt/WDC/Documents/Git/CoreFreq modules_install
make[1]: Entering directory '/usr/lib/modules/5.4.1-3-tkg-bmq/build'
INSTALL /home/terence/Documents/Git/CoreFreq/corefreqk.ko
At main.c:160:
- SSL error:02001002:system library:fopen:No such file or directory: crypto/bio/bss_file.c:69
- SSL error:2006D080:BIO routines:BIO_new_file:no such file: crypto/bio/bss_file.c:76
sign-file: certs/signing_key.pem: No such file or directory
DEPMOD 5.4.1-3-tkg-bmq
make[1]: Leaving directory '/usr/lib/modules/5.4.1-3-tkg-bmq/build'
I think your kernel is only loading certified modules. Thus you'll have to sign the CoreFreq driver:
scripts/sign-file
on corefreqk.ko
I didn't enforce signed modules only. I can manually insmod the driver fine.
So you confirm it's now working ?
Btw, did you try:
make FEAT_DBG=2 clean all
This will let you use CoreFreq in parallel of k10temp
It seems to work correctly now. However I had to do some changes to your AUR package. Here is the diff:
diff --git a/PKGBUILD b/PKGBUILD
index 0f2a5bd..ebda51c 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -1,9 +1,7 @@
# Maintainer: CyrIng <labs[at]cyring[dot]fr>
# Contributor: CyrIng <labs[at]cyring[dot]fr>
-_gitname=CoreFreq
pkgname=corefreq-git
-realname=corefreq
-pkgver=1.69
+pkgver=r829.b62b3f0
pkgrel=1
pkgdesc="CoreFreq, Processor monitoring software with BIOS like functionalities"
arch=('x86_64')
@@ -11,21 +9,38 @@ url='https://github.com/cyring/CoreFreq'
license=('GPL2')
depends=('dkms')
makedepends=('git')
-source=(git+${url}.git)
-md5sums=('SKIP')
-install=${realname}.install
+source=($pkgname::git+${url}.git
+ 'dkms.conf')
+md5sums=('SKIP'
+ '1be42c3d47c2efda9b49d8a7f3d12582')
+install=corefreq.install
+
+pkgver() {
+ cd "$pkgname"
+ printf "r%s.%s" "$(git rev-list --count HEAD)" "$(git rev-parse --short HEAD)"
+}
+
+
+prepare() {
+ cd ${srcdir}/${pkgname}
+ make FEAT_DBG=2 clean all
+}
package() {
- cd ${srcdir}/${_gitname}
- BINDIR=${pkgdir}/bin
- SRCTREE=${pkgdir}/usr/src
- DRVTREE=${SRCTREE}/corefreqk-${pkgver}
- # dkms setup
- install -Dm 0644 Makefile ${DRVTREE}/Makefile
- install -Dm 0644 dkms.conf ${DRVTREE}/dkms.conf
- install -Dm 0755 scripter.sh ${DRVTREE}/scripter.sh
- install -m 0644 *.c *.h ${DRVTREE}/
- # systemd setup
- install -Dm 0644 corefreqd.service \
- ${pkgdir}/usr/lib/systemd/system/corefreqd.service
+ cd ${srcdir}/${pkgname}
+
+ BINDIR=${pkgdir}/bin
+ SRCTREE=${pkgdir}/usr/src
+ DRVTREE=${SRCTREE}/corefreqk-${pkgver}
+ # dkms setup
+ install -Dm 0644 ../dkms.conf ${DRVTREE}/dkms.conf
+ sed -e "s/@PKGVER@/${pkgver}/" \
+ -i "${DRVTREE}/dkms.conf"
+
+ install -Dm 0644 Makefile ${DRVTREE}/Makefile
+ install -Dm 0755 scripter.sh ${DRVTREE}/scripter.sh
+ install -m 0644 *.c *.h ${DRVTREE}/
+ # systemd setup
+ install -Dm 0644 corefreqd.service \
+ ${pkgdir}/usr/lib/systemd/system/corefreqd.service
}
diff --git a/dkms.conf b/dkms.conf
new file mode 100644
index 0000000..8b21081
--- /dev/null
+++ b/dkms.conf
@@ -0,0 +1,27 @@
+# CoreFreq
+# Copyright (C) 2015-2019 CYRIL INGENIERIE
+# Licenses: GPL2
+#
+AUTOINSTALL="yes"
+REMAKE_INITRD="no"
+DRV_PATH=/kernel/drivers/misc
+DRV_VERSION=@PKGVER@
+PACKAGE_NAME="corefreqk"
+PACKAGE_VERSION="$DRV_VERSION"
+BUILT_MODULE_NAME[0]="corefreqk"
+DEST_MODULE_LOCATION[0]="$DRV_PATH"
+CLEAN="make -C $source_tree/$PACKAGE_NAME-$PACKAGE_VERSION clean"
+MAKE[0]="make -C $source_tree/$PACKAGE_NAME-$PACKAGE_VERSION"
+#
+DAEMON="\$source_tree/\$PACKAGE_NAME-\$PACKAGE_VERSION/corefreqd"
+CLIENT="\$source_tree/\$PACKAGE_NAME-\$PACKAGE_VERSION/corefreq-cli"
+SCRIPT="scripter.sh"
+COMMAND="install -Dm 0755 -s -t /bin"
+OBJECTS="\$source_tree/\$PACKAGE_NAME-\$PACKAGE_VERSION/*.o"
+BINARIES="/bin/corefreqd /bin/corefreq-cli"
+CLEANUP="rm -f"
+#
+POST_BUILD="$SCRIPT $COMMAND -- $DAEMON $CLIENT"
+POST_INSTALL="$SCRIPT $CLEANUP -- $OBJECTS"
+POST_REMOVE="$SCRIPT $CLEANUP -- $BINARIES"
+#
Very interesting. I need to process these Package changes, thank you.
Do you mind to show me the CoreFreq temperature of Cores beside those from lm_sensors: I want to check if FEAT_DBG=2
make things accurate ?
Sure. What I did was to auto generate the version as it's a git package and change it dynamically inside dkms.conf. Also I added a prepare() for the FEAT_DBG=2
As such?
There are some differences but I can't tell if they are due to each software sampling time.
CoreFreq shows a processor temperature of 46°C (read from CPU number 1) whereas k10temp shows 43
Are you aware if a temperature offset has to be applied on the Ryzen 5 3600 ?
(it was the case with first gen, but I put none with Matisse)
CoreFreq shows a voltage Core of 1.10V This is computed from the current P-state. lm_sensors provides several voltages from different IC(s) and/or SMU, but no P-State. Thus nothing to compare with; beside BIOS where P-States can be defined (and read later by CoreFreq itself)
The view Power & Voltage will also provide you the current power or energy consumed. Those are computed from the Processor RAPL counters. I'm wondering how they differ from lm_sensors when processor is stressed or idle ?
Hello,
Last version 1.69.7 now let you run CoreFreq in parallel of k10temp (lm_sensors).
I'm not aware of the need for a temperature offset. Here is a hopefully more accurate and complete screenshot (notice I'm using https://github.com/electrified/asus-wmi-sensors) :
Thanks, CoreFreq measurement is identical to zenpower and Asus Core temperature.
But I see an issue with the initialization of the minimum temperature which does not happened with previous Zen generation where the offset is the minimum value.
Here screenshot shows zero which is a wrong value.
I need to fix this...
Version 1.69.8 is providing a fix to the minimal temperature to be not zero. Feel free to test, thank you.
Looks like it's working, great job :)
Time to close this now right?
Thanks. Yes you can close the issue.
I installed the latest version from master and modprobed the module and after some time my machine became completely unresponsive. I could however ssh into it and retrieve the following:
Running
perf top
showed 90% of cpu time was used in collect_percpu_timesArchLinux with MuQSS kernel 5.2.16