electrified / asus-wmi-sensors

Linux HWMON (lmsensors) sensors driver for various ASUS Ryzen and Threadripper motherboards
GNU General Public License v2.0
252 stars 30 forks source link

Asus Crosshair VIII Impact support? #62

Open philwo opened 4 years ago

philwo commented 4 years ago

Hi,

I understand that X570 boards supposedly no longer have a WMI interface and thus loading the kernel module fails as expected:

$ sudo modprobe asus-wmi-sensors
modprobe: ERROR: could not insert 'asus_wmi_sensors': No such device

$ dmesg
[ 1596.035029] asuswmisensors: Vendor: ASUSTeK COMPUTER INC. Board: ROG CROSSHAIR VIII IMPACT BIOS version: 1302 WMI version: 0

However, with the nct6775 driver I'm missing the VRM and PCH temperatures and fan speeds. HWiNFO64 on Windows is able to read them out and shows them in an "ASUS EC" category and initially warned that reading them might slow-down the system:

IMG_2602

Any idea how to get these sensor values on Linux?

For reference, here's the dmidecode output:

$ sudo dmidecode -t baseboard
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
    Manufacturer: ASUSTeK COMPUTER INC.
    Product Name: ROG CROSSHAIR VIII IMPACT
    Version: Rev X.0x
    Serial Number: *redacted 15 digit number*
    Asset Tag: Default string
    Features:
        Board is a hosting board
        Board is replaceable
    Location In Chassis: Default string
    Chassis Handle: 0x0003
    Type: Motherboard
    Contained Object Handles: 0

Handle 0x0032, DMI type 10, 6 bytes
On Board Device Information
    Type: Video
    Status: Enabled
    Description:    To Be Filled By O.E.M.

Handle 0x0038, DMI type 41, 11 bytes
Onboard Device
    Reference Designation:  Onboard IGD
    Type: Video
    Status: Enabled
    Type Instance: 1
    Bus Address: 0000:00:02.0
KeithMyers commented 4 years ago

You would have to take that up with Guenter Roeck who is the developer of the nct6775 driver.

electrified commented 4 years ago

Those sensors aren't wired up to the Super IO sensor inputs, so the nct6775 driver can't read them.

e.g. for the VRM temperature - it's not a standard thermistor, the VRMs have their own internal temperature sensors which are read by the EC over i2c.

Reading from the EC is undoubtedly possible... it's just figuring out how to do it :D

There is some info on the notebook fan control wiki about figuring out which EC memory locations may be fans and temperatures: https://github.com/hirschmann/nbfc/wiki/How-to-create-a-NBFC-config

I'd go down this route, decompiling the DSDT and / or investigating the registers using RW Everything in Windows.

The embedded controller memory space can be exposed in linux using the ec_sys module

bungo ~ # modprobe ec_sys write_support=1

bungo ~ # hexdump /sys/kernel/debug/ec/ec0/io 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000030 0000 0000 0000 0000 0000 5135 d826 003e
0000040 007c 0c00 0f0b ff0f 0000 991e ff46 ff46
0000050 0000 0000 ff00 00ff 0000 0000 0000 0000
0000060 0000 1500 001a 0000 1c00 0000 0000 0000
0000070 0000 0000 0000 0000 0000 0000 0000 0000
0000080 0000 0000 0000 0000 0000 0000 0002 0000
0000090 5601 0000 0000 ce01 0000 ff03 0000 0000
00000a0 0000 4305 2e03 ba04 0000 0000 0000 0000
00000b0 0000 0000 0000 0000 0000 0000 6205 0000
00000c0 0000 0000 0000 0000 0000 0000 0000 0000
00000d0 0040 4110 0400 0000 2834 0001 0040 0108
00000e0 0000 0000 0003 0000 0000 0000 0000 0000
00000f0 0000 000a 0055 0000 000f 00bf 0000 0015
0000100

It may take writes to addresses to return the sensor values.

Once the fields are identified it would be a case of wrapping them into a HWMON sensors driver (it's significantly different from the WMI access method so wouldn't make sense for them to be in this driver)

philwo commented 4 years ago

Thanks for the detailed explanation! I'll play around with the EC in the next days. :)

FWIW I today realized that the acpi_enforce_resources=lax kernel flag that was needed to at least get some sensor data from the nct6775 chip causes the board to often fail to reboot. This can then only be fixed by removing the power cable for a few seconds or via the Reset CMOS button, otherwise it will continuously boot loop. :open_mouth:

KeithMyers commented 4 years ago

Yes, they always post the disclaimer that using that kernel command flag can cause issues. I never had any issues using it on my ASUS X99-E-10G WS workstation mobo with the nct6775 driver thankfully.

But with any AMD board that has a tendency to fail to boot through memory training, you are testing fate already.

I know that there have been a lot of changes upstream in the kernel for the nct6775 driver that haven't made it downstream yet in the older kernels.

berniyh commented 4 years ago

I've got an X570-E Gaming and was facing the same problem, since I've connected a temperature sensor and that seems to be only available via the EC.

However with the link above (thanks!) I knew where to dig and did some first investigations and it does indeed work: ASUS-EC-Fanspeed

The marked 2 bytes give the fan speed of the CPU optional fan. Here it's CF03 which should be read as 03 CF and translates into 975 which corresponds to the fan speed of the fan I connected to that output for pwm = 255 (full speed). Since I have the same fan installed at another port I can monitor through the Nuvoton chip I know that this is the case. Also checked it for pwm = 0 which results in output 00 00 (the fan does indeed stop) and pwm = 180 which results in CE02 or 718 rpm (cross-checked that as well).

Now that's only proof of concept so far, I need to do some digging to find which of the other values corresponds to the temperature, power etc. but that's only a matter of time. However I'd suggest you go check for your board if it's the same byte that gives the fan speed for you, just to see if it varies from board to board or if it might be the same across all ASUS (X570?) boards.

electrified commented 4 years ago

Great investigation work @berniyh.

If you make any further findings, please share them.

I have the same problem on my Crosshair VII - fans connected to the fan extension board aren't available via WMI (and they aren't connected to the IT87 chip either). I figure they are probably exposed by the EC but haven't looked into it.

berniyh commented 4 years ago

Already did find some more. I think that the group 2f48 2b2d in the fourth line is a set of 4 temperatures and that 2b here is the external temperature sensor (43 °C). Need to verify that though. The bytes in the second-to-last line might be related to the CPU power drain (either power or current), but I'm not yet certain about that.

berniyh commented 4 years ago

ok, so here are my findings.

(Adding a new screenshot, since it's easier to talk about the output of hexdump -C) Screenshot_20200421_201208

Bytes are counted from the left, starting with 0, Ending with F. 0x3A: Chipset Temperature, here 0x3C = 60 (°C) 0x3B: CPU Temperature, here 0x21 = 33 (°C), more or less the same as the NCT6798 output 0x3C: Mainboard Temperature, here 0x26 = 38 (°C), same as the NCT6798 output for the Chipset 0x3D: T_Sensor Temperature, here 0x1e = 30 (°C), in my case the water temp 0x57+0x58 and/or 0xB4+0xB5: Chipset Fan, here 0x0964 = 2404 (RPM) and 0x0969 = 2409 (RPM) 0xB0+0xB1: CPU OPT Fan, here 0x02B6 = 694 (RPM) 0xF4: CPU Current, here 0x01 = 1 (A)

hwinfo somehow get's a decimal reading for the Chipset temperature (monitored in 0x3A), but I don't know where it's coming from. I don't think they get it via the EC. With monitoring the change I didn't see a value changing when the decimal changed, so I don't think it's there. I also don't know what the difference between the two chipset temperatures is. I'm pretty certain that the 60°C here corresponds to the actual X570 chipset since that is what the fan reacts to. And yes, the temperatures are that bad on that board, which causes the chipset fan to constantly run at 2000-3000 RPM. It's due to a really bad cooler design by ASUS, there are a couple of threads in some forums about that topic. But that other temp (38°C here), no idea what it corresponds to. But it's that temperature that actually the ASUS monitoring software reports as chipset temperature. Really weird stuff …

I also don't know why there are two readings for the chipset fan. There is only one fan on the board. The two readings don't match exactly, but they are very similar. Could also be that the difference is just due to a slight delay in getting the reading. In the end I doubt it matters, since they always give roughly the same value, from 0 (stopped fan) up to 2700 rpm (highest I observed).

One last note: hwinfo also gives a reading for the CPU power, but I'm almost certain that they just take the VCore reading from the NCT6798 and multiply it with the current reading from 0xF4. The values I get from that suggest that this is the case.

There surely is more information in the output of the EC. At least there are some other fields that change from time to time. But it's very hard to relate them to anything.

philwo commented 4 years ago

@berniyh Thanks for your great research 👍

But that other temp (38°C here), no idea what it corresponds to. But it's that temperature that actually the ASUS monitoring software reports as chipset temperature. Really weird stuff …

Maybe that's the VRM temperature? 38°C sounds about right for an idle machine (I'm at 42°C now) and HWiNFO64 shows "VRM temperature" as part of the "ASUS EC" sensor.

I also don't know why there are two readings for the chipset fan. There is only one fan on the board.

The board does have two fans, doesn't it? One for the VRMs and one for the chipset. HWiNFO64 shows them as "COV/VRM HS Fan" (spinning at 1420 RPM for me) and "Chipset Fan" (spinning at 4100 RPM for me).

Here's a picture showing the fans: https://www.asus.com/websites/global/products/wodanwpswfp0wiug/img/heatsink/heatsink-03.png

berniyh commented 4 years ago

Maybe that's the VRM temperature? 38°C sounds about right for an idle machine (I'm at 42°C now) and HWiNFO64 shows "VRM temperature" as part of the "ASUS EC" sensor.

Remember I've got a X570-E Gaming and for that board it doesn't show any VRM temperature. A quick look around the internet revealed that for the VIII Impact there is also a VRM temperature shown in the UEFI, which is not the case for the board that I have. Possibly this is due to the board using different VRMs. It's IR3555 for the X570-E gaming, while the Crosshair VIII Impact utilizes TDA21472 VRMs. Maybe the latter come with an integrated sensor while the IR3555 don't.

Also I'm pretty certain that the monitored temperature is not for the VRMs, because it doesn't really change much under load. When the system is running for a while it'll be around 38°C, maybe 40°C under load. VRMs will likely have around 60°C under load or – depending on the CPU – even higher. In any case there should be a change between idle and load which for this sensor there isn't.

The board does have two fans, doesn't it? One for the VRMs and one for the chipset. HWiNFO64 shows them as "COV/VRM HS Fan" (spinning at 1420 RPM for me) and "Chipset Fan" (spinning at 4100 RPM for me).

I didn't dismantle the VRM cooler, but I doubt there is a fan hidden beneath there. For the chipset cooler there is definitely only one fan there (saw that because you have to dismantle it to install the M.2 SSDs). Also there is only one such fan listed in the UEFI. And even if there was a second one somewhere else on the board, why should it always spin with the same speed and why should that speed be coupled to the chipset temperature? Doesn't really make sense in my opinion. No, for some reason the PCH fan is doubled here. Maybe one is the input (set speed) and one is the output (actual speed).

However your comments suggest that someone with a VIII Impact board should really check the EC values, because now I'd expect the outcome to be different and that an hwmon sensor reading the EC would need to differentiate between the different boards using a lookup table or something like that. So go check it out, only takes about 15min to get the info when using EC probe. ;)

philwo commented 4 years ago

@berniyh Oh lol, sorry - I completely missed that you're using a different board than I do.

I'll get the data from mine similar to how you did, then we can compare! 😄

kantlivelong commented 3 years ago

Found this thread last night while banging my head against the wall trying to get T_SENSOR data from my Dark Hero. Ended up writing this script which can be run as a service to fetch sensor data from a Dark Hero. I've only added the relevant sensors that I use but welcome PRs. Not much in terms of error handling or anything but does the job for now.

https://github.com/kantlivelong/Sensors-ASUS-EC

Thanks to @berniyh for the initial data findings.