NETWAYS / check_hp_firmware

Icinga / Nagios check plugin to verify HPE controllers an SSD disks are not affected by certain vulnerabilities
GNU General Public License v2.0
2 stars 1 forks source link

[Question]: Error message "no data retrieved in walk for table" #99

Open torstenbunde opened 5 months ago

torstenbunde commented 5 months ago

Ask a question

Hello,

we installed the check_hp_firmware plugin on Friday, 31st of May with the just released version 1.4.0.

Using this check plugin on an Hewlett Packard Enterprise (HPE) we get the following error:

- no data retrieved in walk for table: .1.3.6.1.4.1.232.3.2.2.1 (*errors.errorString)

grafik

The system is an

another system with iLO Firmware Version 3.03 (Mar 22 2024) has the same problems. Other Systems with firmware versions less than 3.0 work correct.

Trying your example snmpwalk (snmpwalk -c public -v2c -On HOST 1.3.6.1.4.1.232) on the affected systems works without any problems.

With OID .1.3.6.1.4.1.232.3.2.2.1 it just

~# snmpwalk -c <COMMUNITY> -v2c -On <HOST> .1.3.6.1.4.1.232.3.2.2.1
.1.3.6.1.4.1.232.3.2.2.1 = No Such Object available on this agent at this OID
~#

With OID .1.3.6.1.4.1.232.3.2.2 it seems to be ok:

~# snmpwalk -c <COMMUNITY> -v2c -On <HOST> .1.3.6.1.4.1.232.3.2.2
.1.3.6.1.4.1.232.3.2.2.2.1.1.0 = INTEGER: 0
.1.3.6.1.4.1.232.3.2.2.2.1.2.0 = INTEGER: 2
.1.3.6.1.4.1.232.3.2.2.2.1.3.0 = INTEGER: 0
.1.3.6.1.4.1.232.3.2.2.2.1.4.0 = INTEGER: 2
.1.3.6.1.4.1.232.3.2.2.2.1.5.0 = INTEGER: 2
.1.3.6.1.4.1.232.3.2.2.2.1.6.0 = INTEGER: 2
.1.3.6.1.4.1.232.3.2.2.2.1.7.0 = Counter32: 0
.1.3.6.1.4.1.232.3.2.2.2.1.8.0 = Counter32: 0
.1.3.6.1.4.1.232.3.2.2.2.1.9.0 = INTEGER: 1
.1.3.6.1.4.1.232.3.2.2.2.1.10.0 = INTEGER: 0
.1.3.6.1.4.1.232.3.2.2.2.1.11.0 = STRING: "               "
.1.3.6.1.4.1.232.3.2.2.2.1.12.0 = INTEGER: 2097152
.1.3.6.1.4.1.232.3.2.2.2.1.13.0 = Gauge32: 0
.1.3.6.1.4.1.232.3.2.2.2.1.14.0 = Gauge32: 0
.1.3.6.1.4.1.232.3.2.2.2.1.15.0 = ""
.1.3.6.1.4.1.232.3.2.2.2.1.16.0 = INTEGER: 1
.1.3.6.1.4.1.232.3.2.2.2.1.17.0 = INTEGER: -1
.1.3.6.1.4.1.232.3.2.2.2.1.18.0 = INTEGER: -1
.1.3.6.1.4.1.232.3.2.2.2.1.19.0 = INTEGER: -1
.1.3.6.1.4.1.232.3.2.2.2.1.20.0 = INTEGER: -1
~#

So it looks like there are no more OIDs around .1.3.6.1.4.1.232.3.2.2.1?!

HPE says that they fixed something around OID and snmpwalk in iLO firmware version 3.04 (https://support.hpe.com/connect/s/softwaredetails?language=de&collectionId=MTX-2dc80c4ae4b943fa&tab=Fixes) but for me there's just the question: problem with the check script? Or does HPE maybe moved (removed?) some OIDs?

martialblog commented 5 months ago

Very curious indeed. Not sure yet what is going on here, I don't think HPE would do such a major change in a minor version. The changelogs says they fixed some values, not changed the OIDs.

torstenbunde commented 5 months ago

I just tested around a little bit during the last days.

If I switch the iLO firmware back to version 3.01 I'll get the same error as above (no data retrieved in walk for table: .1.3.6.1.4.1.232.3.2.2.1 (*errors.errorString)).

If I switch the iLO firmware back to version 2.x the check works as expected and I'll get the following error: 20240607_080916

So it might be more a problem with the firmware than the check script.

martialblog commented 5 months ago

Thanks for the further investigation. I don't have access to an iLO at the moment to test this myself.

We can keep this issue open for further feedback. I

torstenbunde commented 2 months ago

I just tried the iLO firmware version 3.07 (published on August, 14th) and the problem still exists showing the same error.

RincewindsHat commented 1 month ago

Additional information: Just had a try with this and it is more than one missing table. If tried with the --ignore-controllers option, it also fails with the drives. This can again be circumvented with --ignore-drives but what would be the point the anymore.

RincewindsHat commented 1 month ago

This https://forum.checkmk.com/t/storage-disk-monitoring-of-hpe-gen-10-server-ilo-5-gone-missing/43562/2 seems related

martialblog commented 1 month ago

@RincewindsHat nice catch. I did read through several HPE changelogs and didn't see anything about removed OIDs... which I would assume someone would mention, then again the HPE websites are not the simplest to navigate and find things.

RincewindsHat commented 1 month ago

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/firmware-bug-with-snmp-ilo-5-3-0-0-proliant-dl380-gen10-hpe/td-p/7204699

RincewindsHat commented 1 month ago

@martialblog same for me, got farther by throwing "hp ilo missing oid 3.0.0" into google, which is not the kind of communication (from HP) I was hoping for.

RincewindsHat commented 1 month ago

@torstenbunde could you, by any chance, upgrade to version 3.07 and/or 3.08? The changelog claims to fix stuff related to SNMP

RincewindsHat commented 1 month ago

My current position is this: @HewlettPackard broke things in the SNMP interface of some or all iLO things. This monitoring plugin works correctly. There I would close this issue if nobody proves me wrong.

martialblog commented 1 month ago

@RincewindsHat Agreed, currently the most likely scenario.

Maybe a hint in the README to redirect people with similar issues.

torstenbunde commented 1 month ago

@torstenbunde could you, by any chance, upgrade to version 3.07 and/or 3.08? The changelog claims to fix stuff related to SNMP

@RincewindsHat I'll try this tomorrow but 3.07 doesn't work as mentioned here: https://github.com/NETWAYS/check_hp_firmware/issues/99#issuecomment-2345421222

torstenbunde commented 1 month ago
RincewindsHat commented 1 month ago

:-(