bb-Ricardo / check_redfish

A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system.
MIT License
110 stars 30 forks source link

Traceback Error on plugin.do_exit() #119

Closed lgmu closed 9 months ago

lgmu commented 10 months ago

Hi, I recently updated to v1.6.1 and noticed some errors (but they actually happened before aswell). Not sure if this is a problem with the plugin or in the redfish module.

Usually it works again on the next retry.

Some system info: Dell Inc. PowerEdge R840 / Dell Inc. PowerEdge R740 iDRAC 9 (Firmware: 7.00.00.00)

[Traceback (most recent call last):
File "/omd/sites/mon/local/lib/monitoring-plugins/mon/check_redfish/check_redfish.py", line 178, in
plugin.do_exit()
File "/opt/omd/sites/mon/local/lib/monitoring-plugins/mon/check_redfish/cr_module/classes/plugin.py", line 379, in do_exit
self.rf.terminate_session()
File "/opt/omd/sites/mon/local/lib/monitoring-plugins/mon/check_redfish/cr_module/classes/redfish.py", line 359, in terminate_session
self.connection.logout()
File "/omd/sites/mon/local/python/venv/omd_plugin_venv_3.8_current/lib/python3.8/site-packages/redfish/rest/v1.py", line 993, in logout
raise BadRequestError("Invalid session resource: %s, "\
redfish.rest.v1.BadRequestError: Invalid session resource: /redfish/v1/SessionService/Sessions/29857, return code: 401]`

Edit: And I hate that ENTER actually submits the issue while I'm still typing...

bb-Ricardo commented 10 months ago

Mhh, do you use the --no-session option? Any particular reason?

It is not recommended to use this option as it unnecessary creates a new session on each plugin run.

lgmu commented 10 months ago

Yes, we do because either HPE or Dell have a limit of sessions, which we reached. Then nobody could manually login anymore.

In their big environments, we have 1 monitoring host + multiple workers that execute the checks, so the requests come from different IPs (and also different session files).

We don't have any issues with --no-session, just sometimes there is this Traceback exception when the logout fails (but we're speaking about single digit errors with tens of thousands of checks.

bb-Ricardo commented 10 months ago

Ahh ok,

This could be solved with an exception handler. But then what should be reported if the logout fails? Just OK? Can't prove that the session isn't still using a seat at the BMC. On thebother hand, the check would run into CRITICAL if the bmc juat reaches the session limit.

What do you think?

lgmu commented 10 months ago

Assuming it recieved all the data before the logout, I think it's okay to just ignore the logout error. It doesn't happen that often, so it's pretty unlikely to happen multiple times on the same host until the first session gets killed.

bb-Ricardo commented 9 months ago

Hi @lgmu,

I just pushed a fix to "next-release" branch to mitigate this issue. Can you please test it out?

Thank you.

lgmu commented 9 months ago

Hi, thanks! I updated and will test it.

lgmu commented 9 months ago

Looks good, thanks!

bb-Ricardo commented 9 months ago

Great that it solved the issue.

Will create another release for this.

giudig commented 5 months ago

Mhh, do you use the --no-session option? Any particular reason?

It is not recommended to use this option as it unnecessary creates a new session on each plugin run.

Hello @bb-Ricardo, first of all thank you for your contribution with this plugin, very useful and efficient. After a few months of use I would like to give you just a small suggestion to report what you say here because in the README it is not very clear the impact of --no-session option usage, indicating that it is not recommended to use this option as it unnecessary creates a new session on each plugin run. Thank you!

bb-Ricardo commented 5 months ago

I thought I added this to the readme, will review it again. Thank you