aristocratos / btop

A monitor of resources
Apache License 2.0
20.46k stars 631 forks source link

[BUG] Weird total CPU temp reading on FreeBSD and Ryzen 9 #488

Open MikeJakubik opened 1 year ago

MikeJakubik commented 1 year ago

I get an odd total CPU temp reading on FreeBSD and Ryzen 9, it's always -273C. The rest of the cores appear to have sane values, but they all display the same temp. Attached is a screenshot.

[mike@fbsd /usr/home/mike]$ uname -a
FreeBSD fbsd.localdomain 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-701b36961c: Thu Dec 29 19:28:32 EST 2022     mike@fbsd.localdomain:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64

[mike@fbsd /usr/home/mike]$ sysctl -a|grep temp
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0
vm.pfault_oom_attempts: 3
net.inet6.ip6.use_tempaddr: 0
net.inet6.ip6.temppltime: 86400
net.inet6.ip6.tempvltime: 604800
net.inet6.ip6.prefer_tempaddr: 0
        value:  /boot/kernel/amdtemp.ko
hw.usb.template: -1
kstat.zfs.misc.arcstats.arc_tempreserve: 0
dev.amdtemp.0.ccd1: 42.1C
dev.amdtemp.0.ccd0: 38.1C
dev.amdtemp.0.core0.sensor0: 39.2C
dev.amdtemp.0.sensor_offset: 0
dev.amdtemp.0.%parent: hostb0
dev.amdtemp.0.%pnpinfo: 
dev.amdtemp.0.%location: 
dev.amdtemp.0.%driver: amdtemp
dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent: 
dev.cpu.31.temperature: 39.2C
dev.cpu.30.temperature: 39.2C
dev.cpu.29.temperature: 39.2C
dev.cpu.28.temperature: 39.2C
dev.cpu.27.temperature: 39.2C
dev.cpu.26.temperature: 39.2C
dev.cpu.25.temperature: 39.2C
dev.cpu.24.temperature: 39.2C
dev.cpu.23.temperature: 39.2C
dev.cpu.22.temperature: 39.2C
dev.cpu.21.temperature: 39.2C
dev.cpu.20.temperature: 39.2C
dev.cpu.19.temperature: 39.2C
dev.cpu.18.temperature: 39.2C
dev.cpu.17.temperature: 39.2C
dev.cpu.16.temperature: 39.2C
dev.cpu.15.temperature: 39.2C
dev.cpu.14.temperature: 39.2C
dev.cpu.13.temperature: 39.2C
dev.cpu.12.temperature: 39.2C
dev.cpu.11.temperature: 39.2C
dev.cpu.10.temperature: 39.2C
dev.cpu.9.temperature: 39.2C
dev.cpu.8.temperature: 39.2C
dev.cpu.7.temperature: 39.2C
dev.cpu.6.temperature: 39.2C
dev.cpu.5.temperature: 39.2C
dev.cpu.4.temperature: 39.2C
dev.cpu.3.temperature: 39.2C
dev.cpu.2.temperature: 39.2C
dev.cpu.1.temperature: 39.2C
dev.cpu.0.temperature: 39.2C

Screenshot_20221230_231537

MikeJakubik commented 1 year ago
[mike@fbsd /usr/home/mike/Programs/btop]$ gmake CXX=g++12 STRIP=true ADDFLAGS="-march=native" info

 ██████╗ ████████╗ ██████╗ ██████╗
 ██╔══██╗╚══██╔══╝██╔═══██╗██╔══██╗   ██╗    ██╗
 ██████╔╝   ██║   ██║   ██║██████╔╝ ██████╗██████╗
 ██╔══██╗   ██║   ██║   ██║██╔═══╝  ╚═██╔═╝╚═██╔═╝
 ██████╔╝   ██║   ╚██████╔╝██║        ╚═╝    ╚═╝
 ╚═════╝    ╚═╝    ╚═════╝ ╚═╝      Makefile v1.4
PLATFORM   ?| FreeBSD
ARCH       ?| x86_64
CXX        ?| g++12 (12.2.0)
THREADS    :| 32
REQFLAGS   !| -std=c++20
WARNFLAGS  :| -Wall -Wextra -pedantic
OPTFLAGS   :| -O2 -ftree-loop-vectorize -flto=32
LDCXXFLAGS :| -pthread -D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -fexceptions -fstack-clash-protection -fcf-protection -fstack-protector -march=native -s -lstdc++ -lm -lkvm -ldevstat -Wl,-rpath=/usr/local/lib/gcc12
CXXFLAGS   +| $(REQFLAGS) $(LDCXXFLAGS) $(OPTFLAGS) $(WARNFLAGS)
LDFLAGS    +| $(LDCXXFLAGS) $(OPTFLAGS) $(WARNFLAGS)

Wish this would just compile with llvm instead of gcc specifically (13 is in base fbsd, 14 in master branch), and under some load here is how this looks (if it matters, this is not a thing in bpytop which usually works flawlessly):

Screenshot_20221231_173953

imwints commented 1 year ago

I'm working on Clang support, I've only got some compile flags to add and std::views::split to look at which isn't implemented at all in libcxx. You cannot compile libstdc++'s <ranges> with Clang at all.

Edit: I've read the other issues and PRs regarding llvm and it seems that there is currently no interest to support premature llvm support by @aristocratos , but if that has changed by now im willing to open a feature request

MikeJakubik commented 1 year ago

I'm working on Clang support, I've only got some compile flags to add and std::views::split to look at which isn't implemented at all in libcxx. You cannot compile libstdc++'s <ranges> with Clang at all.

Edit: I've read the other issues and PRs regarding llvm and it seems that there is currently no interest to support premature llvm support by @aristocratos , but if that has changed by now im willing to open a feature request

That's great to hear, might may things easier on Apple computers too, since they come llvm by default. Any idea how/where that total CPU temp value is derived? perhaps i can point to the right resource in FreeBSD itself (i also have access to some Intel and AMD Epyc servers with FreeBSD).

aristocratos commented 1 year ago

@stwnt

I've read the other issues and PRs regarding llvm and it seems that there is currently no interest to support premature llvm [...]

I've never really voiced any opinion on llvm since clang hasn't had support for std::ranges before version 15 that was recently released. I believe you are referring to the discussions about cmake?

However supporting compilation with Clang 15 shouldn't be an issue with the current build system.

Testing supported compiler flags are already done in the Makefile (some of the currently used flags are hardware specific), see: https://github.com/aristocratos/btop/blob/c4ee41ebc0f7fabbf4717450c889292464dbeb6d/Makefile#L42-L43 https://github.com/aristocratos/btop/blob/c4ee41ebc0f7fabbf4717450c889292464dbeb6d/Makefile#L127-L128

Theoretically the only changes needed for the Makefile is to add a check for $(CXX) --version, grep for clang and check that $(CXX) -dumpversion is greater or equal to 15.0.0. Then add/switch any needed flags.

The Tools::ssplit() function in btop_tools.cpp was also an issue when compiling with msvc in btop4win, so the rewritten version of that function can be copied over from btop4win and used instead: https://github.com/aristocratos/btop4win/blob/c2ab1e50e2fdcc294a6c16eeb878b36600d18eec/src/btop_tools.cpp#L370-L381

I can take a look at it when I've got some time if you are unfamiliar with (the sometimes a bit abstract) Makefile logic :)

@MikeJakubik Regarding your issue, I'm not sure why you get temperatures for each core with sysctl, as far as I know Ryzen only has sensors for the ccd's, so the only actual real temps in your output would be:

dev.amdtemp.0.ccd1: 42.1C
dev.amdtemp.0.ccd0: 38.1C

So the CPU "package" temp should be the average of 42.1 and 38.1, and the CPU cores should have the same temperature as the CCD they belong to.

The reason you are getting wrong values for the CPU is probably because AMD changed the name for the sensors again, so when btop was written (before Ryzen 9 was released) these sensor names wasn't included. Will take a look at it when I've got some time.

MikeJakubik commented 1 year ago

The GCC vs LLVM thing isn't a major issue for me, just thought it be nice. I'm not a dev just an admin and i assumed c++ standards and features would be the same, but i guess not. The main issue is the display of -270C temperature (how is this number calculated?). This works in bpytop fine and im pretty sure it used to on the C version too, so not sure what changed either, but if i knew how this value was produced it should be simple to tell the issue.

MikeJakubik commented 1 year ago

Also just FYI, the Ryzen 9 and Epyc (and most Zen3+ ive seen) CPUs do report temps on each individual core (even on L3 caches!), not just the CCD's. This is probably why we see each individual temp entry in FreeBSD's dev.cpu sysctls (though they don't seem to report correctly in this case). Attached is a screenshot of the exact same system running Windows 11 with HWiNFO.

hwinfo

MikeJakubik commented 1 year ago

Update.

I switched to the main branch from FreeBSD, recompiled, and it shows the total temp correctly now. However, IO does not, it just reads 0% usage at all times. Going back to bpytop, as it works perfectly.

danjenson commented 1 year ago

I am seeing something similar on void linux: image the CPU temp is always higher in the upper right than the measurement for any core

MikeJakubik commented 1 year ago

I tried compiling the latest master but got a few issues now. It complains about -flto being invalid (40) and can't find something called fmt/core.h. I tried commenting out -flto and installing a port named libfmt, but it still can't find it. Tried both gcc12 and clang15, and I had the same issue. It seems like a ./configure script would be handy to detect these.

imwints commented 1 year ago

@MikeJakubik The PR for Clang isn't merged yet, but Clang 16 is required anyway.

Seems like you didn't pull the submodule properly

git pull
git submodule init
git pull --recurse-submodules
MikeJakubik commented 1 year ago

@MikeJakubik The PR for Clang isn't merged yet, but Clang 16 is required anyway.

Seems like you didn't pull the submodule properly

git pull
git submodule init
git pull --recurse-submodules

Ahh yes, I did not do git clone --recursive, which took care of libfmt, but the rpath is still statically defined in the Makefile. After changing that to reflect gcc12 I got it to compile, however, it still shows a bogus overall CPU temp of 8C.

Screenshot 2023-05-29 051210