Closed chanchiwai-ray closed 8 months ago
Note that I edit my PR description to indicate a potential loop hole that we can't really fix. The workaround is to increase the redfish timeout so that the case I mentioned can be significantly reduced.
Do not agree with defining the context manager, but not using it, since it can fail. But I do not see this blocker issue for now.
We shouldn't simply remove it because we don't use it, since it will cause a backward compatibility issue (just in case other users are using this repo as a library).
Do not agree with defining the context manager, but not using it, since it can fail. But I do not see this blocker issue for now.
We shouldn't simply remove it because we don't use it, since it will cause a backward compatibility issue (just in case other users are using this repo as a library).
I never suggested that, but if anyone is using as lib, they will hit same issue and the fix will be same as you are doing "not using context manager".
After reading through the comments and thinking about this,
RedfishHelper
class. We also need to have a try/except block around the context manager itself to catch exceptions while running the various helper methods for redfish (get_chassis_data
, get_storage_drive_data
etc.) inside it.redfish_obj.login
which will cause __enter__
to raise it and __exit__
isn't called. Which is supposed to be fine...but somehow there's still a login session being created so not logging out is a problem. As mentioned in the existing comments already, there does seem to be an upstream bug lurking here.__enter__
similar to what @gabrielcocenza proposed in this comment. Or we could just directly call the login/logout methods in collector.py
so there's just one set of try/except/finally statements and we can ignore the context manager -- which is Ray's implementation in the PR now. I'm personally in favor of the second option.redfish_obj.login
doesn't work and the session_{key,location}
aren't set, the server still registers a login. If redfish_obj.logout
itself doesn't work for some reason to clean up the session, then the fix would certainly have to come from the upstream redfish library. We could potentially check for existing sessions and perform cleanup manually but then it would run the risk of removing legitimate sessions by actual users since we wouldn't have any way of identifying the sessions created by the charm.Please correct me if I'm wrong in any of the assumptions in my summary.
I don't quite understand how if the redfishobj.login doesn't work and the session{key,location} aren't set, the server still registers a login. If redfish_obj.logout itself doesn't work for some reason to clean up the session, then the fix would certainly have to come from the upstream redfish library. We could potentially check for existing sessions and perform cleanup manually but then it would run the risk of removing legitimate sessions by actual users since we wouldn't have any way of identifying the sessions created by the charm.
For example if you set a login timeout to 1s, and you think your server can handles the login in 1s, that's fine. However, if the server take too long to respond (say 5s), then redfish_client
raise time out exception, and can't handle the response which sets the self.__session_key
. In that case, even if we do redfish_client.logout
[1] it will do nothing.
[1] https://github.com/DMTF/python-redfish-library/blob/main/src/redfish/rest/v1.py#L1027
Rebased onto ba509b6
The redfish context manager could potentially failed during
__enter__
which might causes the__exit__
function never called. We should ensure the redfish client is logged out in any exception. This can be achieved by adding a finally block to the try-except block.Note that even if the
login()
crashes, the server might still recognize the login as successful, and a force logout might not work since we did not get the response from the initiallogin()
. For example, if a login is timed out, and the exporter service raise and catch the exception, but the response from thelogin()
will not set thesession_key
orsession_location
in the response object (since we never get back a response), so a forcelogout()
will not be able to perform aDELETE
(using thesession_location
) on the server to close the connection. This is an inherent issue on the redfish library that probably cannot be easily fixedCloses: https://github.com/canonical/hardware-observer-operator/issues/136