bb-Ricardo / check_redfish

A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system.
MIT License
110 stars 30 forks source link

TypeError: 'NoneType' object is not iterable #98

Closed xpros closed 1 year ago

xpros commented 1 year ago

https://github.com/bb-Ricardo/check_redfish/blob/e55dd688e614a6e93abc361de171ff309cef8dee/cr_module/classes/redfish.py#L611

Hello!

I would like to report an intermitted issue that I have observed with check_redfish.

I have observed the below error:

File "/opt/monitoring/plugins/redfish/check_redfish/check_redfish.py", line 152, in plugin.rf.discover_system_properties() File "/opt/monitoring/plugins/redfish/check_redfish-v1.3.2/cr_module/classes/redfish.py", line 588, in discover_system_properties for entity in rf_path.get("Members"): TypeError: 'NoneType' object is not iterable

Of note, the code snippet has been observed with tag v1.3.2. The snippet link is from the most recent tagged version, v1.4.1, and it does not look as though the code affected by this issue in discover_system_properties has been updated for some time so I believe the issue to exist in the latest tag.

Apologies for any confusion.

And most importantly of all, THANK YOU! This check script is ah'mazing! If I can be of any assistance or if any additional information is required, please let me know.

bb-Ricardo commented 1 year ago

Hi,

Thank you very much. Indeed this is an issue and can be fixed quite easily. I'm more curious for which system this occurs.

Would you mind to send me a mockup? Maybe there are more/different topics to fix.

xpros commented 1 year ago

An example system is an R640 w/iDRAC firmware version 5.10.50.00. I'm not positive what you mean by mockup. What do you mean by that? And how may I get that to you?

I added the following bits to temporarily resolve the error:

            if rf_path.get("Members") is None:
                continue

After that was in place, I observed the following:

File "/opt/monitoring/plugins/redfish/check_redfish/check_redfish.py", line 163, in if any(x in args.requested_query for x in ['storage', 'all']): get_storage(plugin) File "/opt/monitoring/plugins/redfish/check_redfish-v1.3.2.1/cr_module/storage.py", line 29, in get_storage get_storage_generic(plugin_object, system) File "/opt/monitoring/plugins/redfish/check_redfish-v1.3.2.1/cr_module/storage.py", line 1051, in get_storage_generic get_enclosures(enclosure_link.get("@odata.id")) File "/opt/monitoring/plugins/redfish/check_redfish-v1.3.2.1/cr_module/storage.py", line 735, in get_enclosures if enclosure_link in plugin_object.rf.get_system_properties("chassis"): TypeError: argument of type 'NoneType' is not iterable

Darn iDRACs are not always reliable. :/

bb-Ricardo commented 1 year ago

Hi, I just pushed a new commit to next-release. Can you check if this looks better now. The plugin should not fail anymore but you probably won't see any data as there are some root objects missing.

xpros commented 1 year ago

The modification does help with resolving the TypeError. As you mentioned though, data is not shown because the objects are missing:

[UNKNOWN]: None : Request error: No 'chassis' property found in root path '/redfish/v1'
[UNKNOWN]: None : Request error: No 'chassis' property found in root path '/redfish/v1'
[UNKNOWN]: None : Request error: No 'chassis' property found in root path '/redfish/v1'
[OK]: All memory modules (Total 192GB) are in good condition
[OK]: All processors (2) are in good condition

I'm not positive how to move forward from here, but the above does not sit entirely too well with me. :)

Do others (or have you) noticed that iDRAC/redfish is unreliable? We keep our iDRACs fairly current and experience similar issues across R630,40,50s.

The issue is intermittent too; most likely, the next run or so the data is returned.

I wonder if a retry could be added to attempt to gather chassis (or any other missing) object.

bb-Ricardo commented 1 year ago

Hi,

This is indeed very strange. We have several hundred DELL server of different types but no issue like that.

I mean it is a good finding that the plugin should not bail out wit ha stack trace if a main "object/tree" is missing. But on the other hand, I have never seen this before and that's why this bug never popped up in our environment.

Could it have another root cause? Network timeouts?

You could run the plugin with -v option and then you should see all HTTP queries and responses.

Eldiabolo21 commented 1 year ago

@xpros have you tried resetting the BMC maybe even remove power, sometimes its just the BMC itself that got stuck at certain points and behaves weirdly.

xpros commented 1 year ago

I'll have to give the more verbose option a try and see what responses are returned.

RE: Resetting the BMC, the issue where attributes are not returned is intermitted for me. Resetting the iDRAC usually does resolve the issue, but so do subsequent runs of the redfish check. I blame the iDRAC as of now; but perhaps it is something else in our environment that -v will maybe help to uncover.

bb-Ricardo commented 1 year ago

will close this issue for now, if it occurs again, feel free to reopen.