armbian / build

Armbian Linux build framework generates custom Debian or Ubuntu image for x86, aarch64, riscv64 & armhf
https://www.armbian.com
GNU General Public License v2.0
3.95k stars 2.22k forks source link

Marvell SoC CPU temp in motd #1135

Closed g-provost closed 5 years ago

g-provost commented 5 years ago

Can someone explain why you apply this -20 on the SoC temperature ?

https://github.com/armbian/build/blob/77d59c69d83902cbe7a0b4a5436464d0a2c1641e/packages/bsp/common/etc/update-motd.d/30-armbian-sysinfo#L50

aprayoga commented 5 years ago

We also found out that /etc/armbianmonitor/datasources/soctemp on helios4 (most likely also on other board that used SolidRun's a38x microsom such as clearfog) link to incorrect sensor.

root@helios4:~# ls -l /etc/armbianmonitor/datasources/soctemp
lrwxrwxrwx 1 root root 35 Oct  9 11:57 /etc/armbianmonitor/datasources/soctemp -> /sys/class/hwmon/hwmon0/temp1_input

root@helios4:~# ls -l /sys/class/hwmon/
total 0
lrwxrwxrwx 1 root root 0 Jan  1  1970 hwmon0 -> ../../devices/platform/soc/soc:internal-regs/f1072004.mdio/mdio_bus/f1072004.mdio-mii/f1072004.mdio-mii:00/hwmon/hwmon0
lrwxrwxrwx 1 root root 0 Jan  1  1970 hwmon1 -> ../../devices/virtual/hwmon/hwmon1
lrwxrwxrwx 1 root root 0 Oct 22 09:49 hwmon2 -> ../../devices/platform/j10-pwm/hwmon/hwmon2
lrwxrwxrwx 1 root root 0 Oct 22 09:49 hwmon3 -> ../../devices/platform/j17-pwm/hwmon/hwmon3
lrwxrwxrwx 1 root root 0 Oct 22 09:49 hwmon4 -> ../../devices/platform/soc/soc:internal-regs/f1011000.i2c/i2c-0/0-004c/hwmon/hwmon4

On linux-marvell 4.4, the Marvell PHY driver does not exposed the temp sensor therefore hwmon0 is the correct sensor. But on upstream linux-stable, it isn't the case. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/phy/marvell.c?h=linux-4.14.y#n1957

it's better to link to /sys/devices/virtual/thermal/thermal_zone0 instead.

root@helios4:~# cat /sys/devices/virtual/thermal/thermal_zone0/type
armada_thermal
ThomasKaiser commented 5 years ago

Can someone explain why you apply this -20 on the SoC temperature ?

You need to ask Igor. When I fixed the annoyingly long execution times in 30-armbian-sysinfo resulting in login delays I just copied this behavior.

g-provost commented 5 years ago

@ThomasKaiser Btw quick side question, is the script/app build/packages/bsp/armbianmonitor/armbianmonitor-daemon is still use by something or some user ? I cannot find anything pointing to it.

igorpecovnik commented 5 years ago

This temp adjustment is one of remaining old quick dirty workarounds. I think we just need to make it right with a mainline kernel according to @aprayoga tips (Thanks!) and forget about Marvell 4.4.y which is more or less obsolete in case of Clearfog/Helios4.

ThomasKaiser commented 5 years ago

is the script/app build/packages/bsp/armbianmonitor/armbianmonitor-daemon is still use by something or some user

Not that I know of.

g-provost commented 5 years ago

@igorpecovnik I'm fixing this issue, but need to ask the following. Does linking to /sys/devices/virtual/thermal/thermal_zone0 should be the default for all board family, beside exceptions already present in the script.

Currently if my assumption is correct, it will default for most boards to /sys/class/hwmon/hwmon0/temp1_input

https://github.com/armbian/build/blob/78d2d8fc2480864ad0bbd83fe36cb9d8e8477179/packages/bsp/common/usr/lib/armbian/armbian-hardware-monitor#L43

So not sure if I should make it an exception for mvebu family, or we can make all board familly link /sys/devices/virtual/thermal/thermal_zone0

igorpecovnik commented 5 years ago

Not 100% sure, so better use exception.

ThomasKaiser commented 5 years ago

The reason the check for /sys/class/hwmon/hwmon0/temp1_input exists was a problem on a board I'm not interested in at all and a remark by @superna9999 in this thread: https://forum.armbian.com/topic/5254-le-potato-up-and-running/?do=findComment&comment=53806

Before we had two exceptions: A20 legacy and the totally uninteresting Actions Semi S500. Default for everything else was /sys/devices/virtual/thermal/thermal_zone0/temp.

So please either set the exception right (Amlogic S905X -- no idea why Armbian started to support this stuff in the first place) or refactor this thermal mess another time and replace the horrid hack in armbian-hardware-monitor with a template system.

ThomasKaiser commented 5 years ago

This temp adjustment is one of remaining old quick dirty workarounds

Why not 'fixing' your motd stuff then finally? Subtracting 20°C doesn't make sense since years.

superna9999 commented 5 years ago

Hmm, /sys/devices/virtual/thermal/thermal_zone0/temp is for thermal_zone, when there is a thermal-zone on the board, using /sys/class/hwmon/hwmon0/temp1_input uses the first hwmon, when there isn hwmon. And this is pretty standard, you can't expect all board to have a thermal_zone or hwmon.

The script should be smarter and actually list & gather all the hwmon and thermal_zones instead of picking a random fixed location. This is not scalable at all.

g-provost commented 5 years ago

Ok I will use an exception for mvebu family therefore. I understand it's not elegant, and that we should refactor the whole block on temp sensor selection. Will put that on our TODO list.

Why not 'fixing' your motd stuff then finally? Subtracting 20°C doesn't make sense since years.

Yes it's part of the commit.

g-provost commented 5 years ago

Let me know if you are ok with the way I did it. I also added Helios4 ambient temperature in motd since there is already a display field for it.