bb-Ricardo / check_redfish

A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system.
MIT License
113 stars 34 forks source link

Poewr supply check does not work properly #84

Closed s-blottk closed 2 years ago

s-blottk commented 2 years ago

Hello,

i receive the following of an HPE ProLiant XL420 Gen9 node: image

But actually the smart storage battery is an in degraded state: image

This plugin (https://exchange.nagios.org/directory/Plugins/Hardware/Server-Hardware/HP-(Compaq)/check_ilo2_health/details) is reporting an issue with the smart storage battery correctly: ILO2_HEALTH CRITICAL - BATTERY Degraded,

Why might the check_redfish plugin not recognize this issue?

iLO version: 4 iLO Firmware Version: 2.78 Apr 28 2021

The following services are enabled: apply Service for (service => request_command in { "HW Storage" = "storage", "HW Processor" = "proc", "HW DIMM" = "memory", "HW Power" = "power", "HW Temperature" = "temp", "HW Fan" = "fan", "HW NICs" = "nic", "HW BMC" = "bmc", "HW System Summary" = "info", "HW Firmware" = "firmware", "HW Management Log" = "mel", "HW Inventory" = "inventory" }){

bb-Ricardo commented 2 years ago

Hi,

What does the storage check return with --detailed turned on?

s-blottk commented 2 years ago

Hi,

This is the detailed storage check return:

[OK]: All HP SmartArray controller (3), logical drives (7), physical drives (56) and enclosures (4) are in good condition.
[OK]: Smart Array P840ar Controller (FW: 7.00) status is: OK
[OK]: Physical Drive (1I:1:25) 1800GB status: OK
[OK]: Physical Drive (1I:1:26) 1800GB status: OK
[OK]: Physical Drive (1I:1:27) 1800GB status: OK
[OK]: Physical Drive (1I:1:28) 1800GB status: OK
[OK]: Physical Drive (1I:1:29) 1800GB status: OK
[OK]: Physical Drive (1I:1:30) 1800GB status: OK
[OK]: Physical Drive (1I:1:31) 1800GB status: OK
[OK]: Physical Drive (1I:1:32) 1800GB status: OK
[OK]: Physical Drive (1I:1:33) 1800GB status: OK
[OK]: Physical Drive (1I:1:34) 1800GB status: OK
[OK]: Physical Drive (1I:1:35) 1800GB status: OK
[OK]: Physical Drive (1I:1:36) 1800GB status: OK
[OK]: Physical Drive (1I:1:37) 1800GB status: OK
[OK]: Physical Drive (1I:1:38) 1800GB status: OK
[OK]: Physical Drive (1I:1:39) 1800GB status: OK
[OK]: Physical Drive (1I:1:40) 1800GB status: OK
[OK]: Physical Drive (1I:1:41) 1800GB status: OK
[OK]: Physical Drive (1I:1:42) 1800GB status: OK
[OK]: Physical Drive (1I:1:43) 1800GB status: OK
[OK]: Physical Drive (1I:1:44) 1800GB status: OK
[OK]: Physical Drive (1I:1:45) 1800GB status: OK
[OK]: Physical Drive (1I:1:46) 1800GB status: OK
[OK]: Physical Drive (1I:1:47) 1800GB status: OK
[OK]: Physical Drive (1I:1:48) 1800GB status: OK
[OK]: Physical Drive (2I:1:1) 1800GB status: OK
[OK]: Physical Drive (2I:1:2) 1800GB status: OK
[OK]: Physical Drive (2I:1:3) 1800GB status: OK
[OK]: Physical Drive (2I:1:4) 1800GB status: OK
[OK]: Physical Drive (2I:1:5) 1800GB status: OK
[OK]: Physical Drive (2I:1:6) 1800GB status: OK
[OK]: Physical Drive (2I:1:7) 1800GB status: OK
[OK]: Physical Drive (2I:1:8) 1800GB status: OK
[OK]: Physical Drive (2I:1:9) 1800GB status: OK
[OK]: Physical Drive (2I:1:10) 1800GB status: OK
[OK]: Physical Drive (2I:1:11) 1800GB status: OK
[OK]: Physical Drive (2I:1:12) 1800GB status: OK
[OK]: Physical Drive (2I:1:13) 1800GB status: OK
[OK]: Physical Drive (2I:1:14) 1800GB status: OK
[OK]: Physical Drive (2I:1:15) 1800GB status: OK
[OK]: Physical Drive (2I:1:16) 1800GB status: OK
[OK]: Physical Drive (2I:1:17) 1800GB status: OK
[OK]: Physical Drive (2I:1:18) 1800GB status: OK
[OK]: Physical Drive (2I:1:19) 1800GB status: OK
[OK]: Physical Drive (2I:1:20) 1800GB status: OK
[OK]: Physical Drive (2I:1:21) 1800GB status: OK
[OK]: Physical Drive (2I:1:22) 1800GB status: OK
[OK]: Physical Drive (2I:1:23) 1800GB status: OK
[OK]: Physical Drive (2I:1:24) 1800GB status: OK
[OK]: no logical drives found for this Controller
[OK]: StorageEnclosure (1I:1) status: OK
[OK]: StorageEnclosure (2I:1) status: OK
[OK]: Smart Array P440 Controller (FW: 7.00) status is: OK
[OK]: Physical Drive (1I:1:49) 3840GB status: OK
[OK]: Physical Drive (1I:1:50) 3840GB status: OK
[OK]: Physical Drive (1I:1:51) 3840GB status: OK
[OK]: Physical Drive (1I:1:52) 3840GB status: OK
[OK]: Physical Drive (1I:1:53) 3840GB status: OK
[OK]: Physical Drive (1I:1:54) 3840GB status: OK
[OK]: Logical Drive (1:1) 3840.7GB (RAID 0) status: OK
[OK]: Logical Drive (1:2) 3840.7GB (RAID 0) status: OK
[OK]: Logical Drive (1:3) 3840.7GB (RAID 0) status: OK
[OK]: Logical Drive (1:4) 3840.7GB (RAID 0) status: OK
[OK]: Logical Drive (1:5) 3840.7GB (RAID 0) status: OK
[OK]: Logical Drive (1:6) 3840.7GB (RAID 0) status: OK
[OK]: StorageEnclosure (1I:1) status: OK
[OK]: StorageEnclosure (1I:1) status: OK
[OK]: Dynamic Smart Array B140i Controller (FW: 6.00) status is: OK
[OK]: Physical Drive (3I:0:9) 150GB status: OK
[OK]: Physical Drive (4I:0:10) 150GB status: OK
[OK]: Logical Drive (31:1) 150.0GB (RAID 1) status: OK
[OK]: no storage enclosures found for this Controller

This is the detailed power check return:

[OK]: All power supplies (2) are in good condition
[OK]: Power supply 1 (720620-B21) status is: OK
[OK]: Power supply 2 (720620-B21) status is: OK
bb-Ricardo commented 2 years ago

interesting: The storage battery is not exposed within the power data in Redfish. Each controller reports the battery status by itself. ILO is more difficult as it's not exposed via Redfish at all. Only if the controller shows a WARNING with no other issues it can be assumed it's the controller battery pack.

Would you mind to create a MockUp of this Server and send it to me? You could use this project "https://github.com/DMTF/Redfish-Mockup-Creator". Will handler your data confidential.

Then I could have a look if something changed with later ILO4 versions.

s-blottk commented 2 years ago

I would if I would be allowed to... Is there any other way I can help?

bb-Ricardo commented 2 years ago

then don't worry about the MockUp. I have to check our Server with the newest ILO4 release and see where this battery information now moved to.

s-blottk commented 2 years ago

Thank you!

It seems also that ProLiant BL460c Gen9 and ProLiant WS460c Gen9 are not supported by the Power check? Both are on iLO 4 as well.

Request error: No power supply data returned for API URL '/redfish/v1/Chassis/1//Power', No power supply data returned for API URL '/redfish/v1/Chassis/enclosurechassis//Power'

bb-Ricardo commented 2 years ago

This is correct. Blade servers have no power supplies. They get the power from the chassi. You would need to monitor the chassis power supplies. For this case there was a '--ignore_missing_ps' option added. https://github.com/bb-Ricardo/check_redfish/commit/95b27e9f0297227e436dc0fdd8828c7dcbf60c05

Still has to be released. Just difficult to find the time to actually do it. If you chwck out the 'next-release' branch, then this should work. Let me know if this helped.

bb-Ricardo commented 2 years ago

Hi,

I looked into it and it turned out that there are storage controller battery information present in iLO4 systems, but only on Gen9 servers. These are similar to the information in iLO5 systems, just in a different location 🤡

I pushed a new commit to expose these infos to the 'next-release' branch.

Can you please test and see if it works? Thank you

s-blottk commented 2 years ago

I will test this. Thank you so far! :-)