chu11 / freeipmi-mirror

Mirror of GNU FreeIPMI Git Repo - http://savannah.gnu.org/projects/freeipmi/. I maintain the upstream of FreeIPMI and can accept Github pull requests.
GNU General Public License v3.0
12 stars 16 forks source link

Incorrect temperature sensor reading with ipmi-sensor #42

Closed casual-lemon closed 3 years ago

casual-lemon commented 3 years ago

ipmi-sensors reads the wrong sensor on a Lenovo BMC (although the problem may happen on other BMCs as well)

More information here: https://bugs.launchpad.net/ubuntu/+source/freeipmi/+bug/1926299

ubuntu@ubuntu:~$ ipmi-sensors --version ipmi-sensors - 1.4.11 ubuntu release - Bionic 18.04

ipmi-sensors -u USERID -p XXXXXXX -D LAN_2_0 -l USER -h x.x.x.x | grep DIMM | grep Temp

146 | DIMM 13 Temp | Temperature | 27.00 | C | 'OK'
149 | DIMM 14 Temp | Temperature | 26.00 | C | 'OK'
152 | DIMM 15 Temp | Temperature | 24.00 | C | 'OK'
155 | DIMM 16 Temp | Temperature | 24.00 | C | 'OK'
158 | DIMM 17 Temp | Temperature | 218.00 | C | 'OK'
161 | DIMM 18 Temp | Temperature | 212.00 | C | 'OK'
165 | DIMM 19 Temp | Temperature | 212.00 | C | 'OK'
Device LUNs
[3] - 1b = LUN 3 has sensors
[2] - 1b = LUN 2 has sensors
[1] - 1b = LUN 1 has sensors
[0] - 1b = LUN 0 has sensors

The issue occurs because an IPMI sensor with multi-LUN (LUN larger than zero) is not handled properly in VMware ESXi. This only impacts AMD 2P based servers and ThinkSystem V2 based servers. 
      10h] = sensor_owner_id[ 7b]
       0h] = sensor_owner_lun[ 2b]
       0h] = sensor_owner_lun.reserved[ 2b]

ipmi-sensors output again to see if the two 0x61 sensor id objects have different lun numbers
DIMM 16
100.x.x.x: [              10h] = sensor_owner_id[ 7b]
100.x.x.x: [               1h] = sensor_owner_lun[ 2b]
100.x.x.x: [               0h] = sensor_owner_lun.reserved[ 2b]

DIMM 17
100.x.x.x: [              10h] = sensor_owner_id[ 7b]
100.x.x.x: [               0h] = sensor_owner_lun[ 2b]
100.x.x.x: [               0h] = sensor_owner_lun.reserved[ 2b]

When the LUN number is not the default (00b) the sensor is reporting back an incorrect temperature reading. You are allowed to change the default, but unless otherwise specified, commands listed as mandatory must be accessed with LUN 00b. Note ipmi-tools correctly reports the sensor temperature because it knows how to use the alternative LUN number correctly.