Open gabrielcocenza opened 4 months ago
I'm guessing the BMC you're using might not be properly closing sessions. Would it be possible to enable the tracing facilities in the library and reproduce the issue? The following code block should create a log file for your with all HTTP traces captured to the file "RedfishHTTPDebug.log".
from redfish import redfish_logger
# Config logger used by Restful library
LOGGERFILE = "RedfishHTTPDebug.log"
LOGGERFORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
LOGGER = redfish_logger(LOGGERFILE, LOGGERFORMAT, logging.DEBUG)
LOGGER.info("Redfish HTTP Tracing")
I'll try to add this capability in our product to try to put some light in this issue.
We identify that this is happening on Dell and HPE servers:
BIOS Version: 2.21.2 iDRAC Firmware Version: 7.00.00.171
BIOS Version: 2.19.1 iDRAC Firmware Version: 6.10.80.00
I also tried to stress one server by opening multiple sessions (20 sessions) and continuously making requests. It also failed with 401, but to create a new session and not to delete as the bug description.
2024-07-17 14:43:13,620 - redfish.rest.v1 - DEBUG - HTTP REQUEST (POST) for /redfish/v1/SessionService/Sessions:
Headers:
Accept: */*
OData-Version: 4.0
Content-Type: application/json
Body: {"UserName": "", "Password": "<REDACTED>"}
2024-07-17 14:43:13,620 - redfish.rest.v1 - INFO - Attempt 1 of /redfish/v1/SessionService/Sessions
2024-07-17 14:43:13,946 - redfish.rest.v1 - INFO - Response Time for POST to /redfish/v1/SessionService/Sessions: 0.3257996898610145 seconds.
2024-07-17 14:43:13,946 - redfish.rest.v1 - DEBUG - HTTP RESPONSE for /redfish/v1/SessionService/Sessions:
Code: 401 Unauthorized
Headers:
Date: Wed, 17 Jul 2024 19:43:03 GMT
Server: Apache
WWW-Authenticate: Basic realm="RedfishService"
OData-Version: 4.0
Access-Control-Allow-Origin: *
Cache-Control: no-cache
X-Frame-Options: DENY
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
Content-Type: application/json; charset=utf-8
Content-Length: 517
Keep-Alive: timeout=60, max=99
Connection: Keep-Alive
Body Response of /redfish/v1/SessionService/Sessions: b'{"error":{"@Message.ExtendedInfo":[{"Message":"Unable to complete the operation because an invalid username and/or password is entered, and therefore authentication failed.","MessageArgs":[],"MessageArgs@odata.count":0,"MessageId":"IDRAC.2.4.SYS415","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"Enter valid user name and password and retry the operation.","Severity":"Warning"}],"code":"Base.1.7.GeneralError","message":"A general error has occurred. See ExtendedInfo for more information"}}'
2024-07-17 14:43:13,946 - redfish.rest.v1 - INFO - Login returned code 401: {"error":{"@Message.ExtendedInfo":[{"Message":"Unable to complete the operation because an invalid username and/or password is entered, and therefore authentication failed.","MessageArgs":[],"MessageArgs@odata.count":0,"MessageId":"IDRAC.2.4.SYS415","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"Enter valid user name and password and retry the operation.","Severity":"Warning"}],"code":"Base.1.7.GeneralError","message":"A general error has occurred. See ExtendedInfo for more information"}}
If you're seeing it from different firmware on different vendors, then that makes it less likely a BMC issue. Perhaps the library is hitting a timeout or other networking issue when it issues a DELETE to the session, which would invoke a retry, and if the BMC already processed the DELETE, a 401 would return...
For that stress test you tried, it's possible you might be hitting a session limit on the BMC. At least the systems I have will return a 503 once you hit their session limit, so it might be something else entirely different.
I'll double check, but the logging traces I recommended adding should capture retry attempts as well. Update: it does show retry attempts.
I was not able to manually reproduce it, but sometimes it seems that the logout is failing with 401.
From the logs it's possible to create a session and fetch information, but if fails to logout. I'm not understanding how the authorization can break during this process.
originally found on canonical/prometheus-hardware-exporter#76