DMTF / python-redfish-library

Python3 library for interacting with devices that support a Redfish service
Other
165 stars 177 forks source link

Logout sometimes fails with 401 response #160

Open gabrielcocenza opened 1 month ago

gabrielcocenza commented 1 month ago

I was not able to manually reproduce it, but sometimes it seems that the logout is failing with 401.

Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Login returned code 201: {"@Message.ExtendedInfo":[{"Message":"The resource has been created successfully.","MessageArgs":[],"MessageArgs@odata.count":0,"MessageId":"Base.1.12.Created","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"None.","Severity":"OK"},{"Message":"A new resource is successfully created.","MessageArgs":[],"MessageArgs@odata.count":0,"MessageId":"IDRAC.2.8.SYS414","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"No response action is required.","Severity":"Informational"}],"@odata.context":"/redfish/v1/$metadata#Session.Session","@odata.id":"/redfish/v1/SessionService/Sessions/132562","@odata.type":"#Session.v1_6_0.Session","ClientOriginIPAddress":"IP_FOR_BOND_M","CreatedTime":"2024-07-16T13:59:10-05:00","Description":"User Session","Id":"132562","Name":"User Session","Password":null,"SessionType":"Redfish","UserName":"root"}
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting redfish sensor info...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A4: 0.1320378198288381 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting smart storage health data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.23425848316401243 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting processor data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.01082204794511199 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting storage controller data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.010600307956337929 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting network adapter data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.0182824837975204 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting chassis data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.4267375646159053 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting storage drive data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.03342375298961997 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting memory dimm data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.01948211621493101 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Getting smart storage health data...
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for GET to /redfish/v1/: 0.09171967394649982 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Attempt 1 of /redfish/v1/SessionService/Sessions/132562
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 INFO Response Time for DELETE to /redfish/v1/SessionService/Sessions/132562: 0.045607013162225485 seconds.
Jul 16 13:59:30 hostname python3[3681]: 2024-07-16 13:59:30 ERROR Invalid session resource: /redfish/v1/SessionService/Sessions/132562, return code: 401

From the logs it's possible to create a session and fetch information, but if fails to logout. I'm not understanding how the authorization can break during this process.

originally found on canonical/prometheus-hardware-exporter#76

mraineri commented 1 month ago

I'm guessing the BMC you're using might not be properly closing sessions. Would it be possible to enable the tracing facilities in the library and reproduce the issue? The following code block should create a log file for your with all HTTP traces captured to the file "RedfishHTTPDebug.log".

from redfish import redfish_logger

# Config logger used by Restful library
LOGGERFILE = "RedfishHTTPDebug.log"
LOGGERFORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
LOGGER = redfish_logger(LOGGERFILE, LOGGERFORMAT, logging.DEBUG)
LOGGER.info("Redfish HTTP Tracing")
gabrielcocenza commented 1 month ago

I'll try to add this capability in our product to try to put some light in this issue.

We identify that this is happening on Dell and HPE servers:

BIOS Version: 2.21.2 iDRAC Firmware Version: 7.00.00.171

BIOS Version: 2.19.1 iDRAC Firmware Version: 6.10.80.00

I also tried to stress one server by opening multiple sessions (20 sessions) and continuously making requests. It also failed with 401, but to create a new session and not to delete as the bug description.

2024-07-17 14:43:13,620 - redfish.rest.v1 - DEBUG - HTTP REQUEST (POST) for /redfish/v1/SessionService/Sessions:
Headers:
        Accept: */*
        OData-Version: 4.0
        Content-Type: application/json

Body: {"UserName": "", "Password": "<REDACTED>"}

2024-07-17 14:43:13,620 - redfish.rest.v1 - INFO - Attempt 1 of /redfish/v1/SessionService/Sessions
2024-07-17 14:43:13,946 - redfish.rest.v1 - INFO - Response Time for POST to /redfish/v1/SessionService/Sessions: 0.3257996898610145 seconds.
2024-07-17 14:43:13,946 - redfish.rest.v1 - DEBUG - HTTP RESPONSE for /redfish/v1/SessionService/Sessions:
Code: 401 Unauthorized

Headers:
        Date: Wed, 17 Jul 2024 19:43:03 GMT
        Server: Apache
        WWW-Authenticate: Basic realm="RedfishService"
        OData-Version: 4.0
        Access-Control-Allow-Origin: *
        Cache-Control: no-cache
        X-Frame-Options: DENY
        Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
        Content-Type: application/json; charset=utf-8
        Content-Length: 517
        Keep-Alive: timeout=60, max=99
        Connection: Keep-Alive

Body Response of /redfish/v1/SessionService/Sessions: b'{"error":{"@Message.ExtendedInfo":[{"Message":"Unable to complete the operation because an invalid username and/or password is entered, and therefore authentication failed.","MessageArgs":[],"MessageArgs@odata.count":0,"MessageId":"IDRAC.2.4.SYS415","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"Enter valid user name and password and retry the operation.","Severity":"Warning"}],"code":"Base.1.7.GeneralError","message":"A general error has occurred. See ExtendedInfo for more information"}}'

2024-07-17 14:43:13,946 - redfish.rest.v1 - INFO - Login returned code 401: {"error":{"@Message.ExtendedInfo":[{"Message":"Unable to complete the operation because an invalid username and/or password is entered, and therefore authentication failed.","MessageArgs":[],"MessageArgs@odata.count":0,"MessageId":"IDRAC.2.4.SYS415","RelatedProperties":[],"RelatedProperties@odata.count":0,"Resolution":"Enter valid user name and password and retry the operation.","Severity":"Warning"}],"code":"Base.1.7.GeneralError","message":"A general error has occurred. See ExtendedInfo for more information"}}
mraineri commented 1 month ago

If you're seeing it from different firmware on different vendors, then that makes it less likely a BMC issue. Perhaps the library is hitting a timeout or other networking issue when it issues a DELETE to the session, which would invoke a retry, and if the BMC already processed the DELETE, a 401 would return...

For that stress test you tried, it's possible you might be hitting a session limit on the BMC. At least the systems I have will return a 503 once you hit their session limit, so it might be something else entirely different.

I'll double check, but the logging traces I recommended adding should capture retry attempts as well. Update: it does show retry attempts.