bb-Ricardo / check_redfish

A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system.
MIT License
110 stars 30 forks source link

Session not renewed #90

Closed Fogelholk closed 1 year ago

Fogelholk commented 2 years ago

Hi there, really nice script! I however have one problem with checking a few of my iDracs (iDrac 8 on PowerEdge FC430, sometimes other hardwares with iDrac 7/8). This is with check_redfish 1.4.1. The iDracs are set to keep sessions for 1800 seconds (default afaik) and I run check_redfish every 5 minute.

It seems like the session is dropped on the iDrac itself, not sure if the internal webserver crashes and restarts, but the established session is not available on the iDrac anymore simply put.

When trying to connect to the server via the script using the sessionfile I get the following instead of the script trying to establish a new session:

# sudo -u icinga /usr/local/libexec/monitoring/check_redfish/check_redfish.py -H FQDN -u USERNAME -p PASSWORD --temp -v -t 20 --sessionfile check_redfish_FQDN_idrac_temperature
2022-06-02 16:22:56,911 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-02 16:22:56,911 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-02 16:22:56,913 - DEBUG: Starting new HTTPS connection (1): FQDN:443
2022-06-02 16:22:57,528 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-02 16:22:57,529 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
2022-06-02 16:22:58,548 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-02 16:22:58,549 - INFO: Attempt 2 of /redfish/v1/Chassis/System.Embedded.1
2022-06-02 16:22:58,550 - DEBUG: Resetting dropped connection: FQDN
2022-06-02 16:22:59,203 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-02 16:22:59,204 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
2022-06-02 16:23:00,206 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-02 16:23:00,207 - INFO: Attempt 3 of /redfish/v1/Chassis/System.Embedded.1
2022-06-02 16:23:00,208 - DEBUG: Resetting dropped connection: FQDN
2022-06-02 16:23:00,712 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-02 16:23:00,713 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
2022-06-02 16:23:01,717 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-02 16:23:01,717 - INFO: Attempt 4 of /redfish/v1/Chassis/System.Embedded.1
2022-06-02 16:23:01,718 - DEBUG: Resetting dropped connection: FQDN
2022-06-02 16:23:03,284 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-02 16:23:03,285 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
[CRITICAL]: Unable to connect to Host 'FQDN', max retries exhausted.

Got any idea? :)

bb-Ricardo commented 2 years ago

Hi,

Yes this seems strange. Do you use a different session file for each check like temp and fan and ...? How many hosts are checking the same IDRAC? In my setup I usually use two or three worker nodes which query the IDRAC.

Each worker has an own session file. What is the max of allowed concurrent sessions?

Fogelholk commented 2 years ago

Max allowed concurrent sessions in iDrac is set to the default, 8. I have set up three checks in icinga as follows:

I have two Icinga nodes running, so tests are running from node 1, and some from node 2, but all checks are consistently run from the same node each time, until one of the Icinga nodes goes down of course :)

So in total iDracs should keep 3 persistent sessions open I suppose, during the above test I set the three separate checks to use their own sessions file with "--sessionfile", but I saw the same problem when trying to use the same session-file for all three checks yesterday, so I changed to using "--nosession" as a workaround until I had time to collect some more data and open this issue :)

bb-Ricardo commented 2 years ago

Yes highly recommend using the same session file. You don't even need to specify it. It will default to the host name and therefore be the same for all the checks running on the same host.

So if this works "normally" then it should take two sessions max by default.

Are there any logs on the IDRAC telling you what's happening? What does the IDRAC say about the consumed sessions?

Fogelholk commented 2 years ago

I was running the script yesterday without --sessionfile and had the same issue on another server, but I tried setting it today just to see if something changed.

In the Lifecycle log it simply says

2022-06-02T15:39:42+0200 Successfully logged in using USER, from ICINGA2 and REDFISH.
2022-06-02T16:04:13+0200 The session for USER from ICINGA2 using REDFISH is logged off.

In between those two log lines I have login attempts by OpenManage Enterprise which checks status every hour.

Successfully logged in using ANOTHERUSER, from OME-IP and WS-MAN.
The previous log entry was repeated 5 times.

When I check another server which had the same problem yesterday, I can see the same thing where OME has logged in just before this script stopped working and the session became invalid. Maybe OME does something fishy with the sessions :)

I'll try disabling OME checking server status and see if I can reproduce the problem tomorrow. But shouldn't this script try to establish a new session if it gets status code 401 automatically?

Fogelholk commented 2 years ago

Trying it out today and it seems to still be happening. OpenManage Enterprise jobs have been disabled and in the iDrac Lifecycle Logs I see the following

2022-06-03T09:36:07+0200 Successfully logged in using USER, from ICINGA2 and REDFISH.
2022-06-03T09:39:01+0200 The session for USER from ICINGA2 using REDFISH is logged off.

And when trying to run the script manually:

# sudo -u icinga /usr/local/libexec/monitoring/check_redfish/check_redfish.py -H FQDN -u USERNAME -p PASSWORD --temp -v -t 20
2022-06-03 09:44:58,249 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-03 09:44:58,250 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-03 09:44:58,252 - DEBUG: Starting new HTTPS connection (1): FQDN:443
2022-06-03 09:44:58,895 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-03 09:44:58,896 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]

Manually removing the sessionfile from /tmp solves the issue temporarily since that forces a new session to be created :)

bb-Ricardo commented 2 years ago

mmhhh, you get a 401 and the session is not reinitiated?

this part should take care of exactly this scenario: https://github.com/bb-Ricardo/check_redfish/blob/e55dd688e614a6e93abc361de171ff309cef8dee/cr_module/classes/redfish.py#L380-L385

after the first 401 it should restart the connection.

Up for a debug session?

Fogelholk commented 2 years ago

Sure! What you want me to test? :)

Fogelholk commented 2 years ago

I seem to be able to reproduce this easily by just logging in to the iDrac myself, and removing the Redfish-session under iDRAC Settings > Sessions. Could a python-package be the wrong version or something? I'm using FreeBSD with Python3.8. These are the packages installed in the venv for check_redfish:

decorator-5.1.1
jsonpatch-1.32
jsonpath_rw-1.4.0
jsonpointer-2.3
redfish-3.1.6
requests_toolbelt-0.9.1
requests_unixsocket-0.3.0

And globally installed python packages are these (via pkg-command in FreeBSD, the built-in package handler)

py38-bcrypt-3.2.0
py38-certifi-2021.10.8
py38-cffi-1.15.0
py38-chardet-4.0.0,1
py38-charset-normalizer-2.0.12
py38-click-8.0.3
py38-cryptography-3.3.2
py38-dnspython-2.2.1,1
py38-idna-3.3
py38-ldap-3.4.0
py38-mysqlclient-2.1.0
py38-openssl-20.0.1,1
py38-paramiko-2.10.3
py38-pep517-0.12.0
py38-pip-20.3.4
py38-pip-tools-6.3.1
py38-ply-3.11
py38-pyasn1-0.4.8
py38-pyasn1-modules-0.2.8
py38-pycparser-2.21
py38-pycryptodomex-3.12.0
py38-pynacl-1.5.0
py38-pysmi-0.3.4_1
py38-pysnmp-4.4.9_2
py38-pysocks-1.7.1
py38-requests-2.27.1
py38-setuptools-57.0.0
py38-six-1.16.0
py38-sqlite3-3.8.13_7
py38-tomli-2.0.1
py38-urllib3-1.26.8,1
py38-wheel-0.36.2
bb-Ricardo commented 2 years ago

can write me an email with your phone number?

packages look good to me.

Fogelholk commented 2 years ago

Sure, uhh, where can I find your email? Doesn't seem to show up on your profile if I'm not completely blind

bb-Ricardo commented 2 years ago

https://github.com/bb-Ricardo/check_redfish/blob/e55dd688e614a6e93abc361de171ff309cef8dee/check_redfish.py#L21

bb-Ricardo commented 2 years ago

Hi,

I just realised the test was not finished. Can you do the same test we did with 3.1.6 with 3.0.3? once you have a session with 3.0.3 then remove the the session in iDRAC and run the plugin again. is the session renewed?

Fogelholk commented 2 years ago

Damn, it seems like we celebrated too soon :( After removing the session via iDrac I get the following:

# sudo -u icinga /usr/local/libexec/monitoring/check_redfish2/check_redfish.py -H FQDN -u USER -p PASS --temp -v -t 20
/redfish/v1/Chassis/System.Embedded.1
here we are
3.0.3
2022-06-08 10:20:12,367 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-08 10:20:12,367 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-08 10:20:12,967 - INFO: Response Time for GET to /redfish/v1/Chassis/System.Embedded.1: 0.2186166208703071 seconds.
2022-06-08 10:20:12,968 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [IncompleteRead(0 bytes read)]
2022-06-08 10:20:13,976 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-08 10:20:13,976 - INFO: Attempt 2 of /redfish/v1/Chassis/System.Embedded.1
2022-06-08 10:20:14,506 - INFO: Response Time for GET to /redfish/v1/Chassis/System.Embedded.1: 0.15978044806979597 seconds.
2022-06-08 10:20:14,506 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [IncompleteRead(0 bytes read)]
2022-06-08 10:20:15,563 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-08 10:20:15,563 - INFO: Attempt 3 of /redfish/v1/Chassis/System.Embedded.1
2022-06-08 10:20:16,424 - INFO: Response Time for GET to /redfish/v1/Chassis/System.Embedded.1: 0.2790320091880858 seconds.
2022-06-08 10:20:16,425 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [IncompleteRead(0 bytes read)]
2022-06-08 10:20:17,432 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-08 10:20:17,433 - INFO: Attempt 4 of /redfish/v1/Chassis/System.Embedded.1
2022-06-08 10:20:18,128 - INFO: Response Time for GET to /redfish/v1/Chassis/System.Embedded.1: 0.18635891983285546 seconds.
2022-06-08 10:20:18,128 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [IncompleteRead(0 bytes read)]
[CRITICAL]: Unable to connect to Host 'FQDN', max retries exhausted.
bb-Ricardo commented 2 years ago

Thank you.

alright. I need to look into it anyway. will let you know when I found a workaround/solution.

bb-Ricardo commented 2 years ago

Hi @Fogelholk,

I might have found a way to solve this issue. But before I create a PR for python-redfish I need your help again.

Can you copy the redfish 3.1.6 package from /usr/local/libexec/monitoring/check_redfish to /usr/local/libexec/monitoring/check_redfish2 and patch it according to this commit?: https://github.com/bb-Ricardo/python-redfish-library/commit/6821a084cee97eb3911ac5f0f8b3081de17e90d2

and then run the same test again and see if it changed.

Thank you.

Fogelholk commented 2 years ago

Tried to apply the fix you suggested (Copying redfish 3.1.6 from check_redfish to check_redfish2, editing check_redfish2/env/lib/python3.8/site-packages/redfish/rest/v1.py with the commit you linked) and unfortunately I get the same as before:

# sudo -u icinga /usr/local/libexec/monitoring/check_redfish2/check_redfish.py -H FQDN -u USER -p PASS --temp -v -t 20
/redfish/v1/Chassis/System.Embedded.1
here we are
3.1.6
2022-06-09 08:37:08,708 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-09 08:37:08,708 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-09 08:37:08,710 - DEBUG: Starting new HTTPS connection (1): FQDN:443
2022-06-09 08:37:09,338 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-09 08:37:09,339 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
2022-06-09 08:37:10,398 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-09 08:37:10,398 - INFO: Attempt 2 of /redfish/v1/Chassis/System.Embedded.1
2022-06-09 08:37:10,400 - DEBUG: Resetting dropped connection: FQDN
2022-06-09 08:37:11,117 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-09 08:37:11,118 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
2022-06-09 08:37:12,125 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-09 08:37:12,125 - INFO: Attempt 3 of /redfish/v1/Chassis/System.Embedded.1
2022-06-09 08:37:12,126 - DEBUG: Resetting dropped connection: FQDN
2022-06-09 08:37:12,812 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-09 08:37:12,813 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
2022-06-09 08:37:13,829 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-09 08:37:13,829 - INFO: Attempt 4 of /redfish/v1/Chassis/System.Embedded.1
2022-06-09 08:37:13,830 - DEBUG: Resetting dropped connection: FQDN
2022-06-09 08:37:14,592 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-09 08:37:14,592 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
[CRITICAL]: Unable to connect to Host 'FQDN', max retries exhausted.

It prints our "here we are" and version but not "here we are 2"

370         print(redfish_path)
371         if self.__cached_data.get(redfish_path) is None:
372             print("here we are")
373             print(redfish.__version__)
374
375             redfish_response = self._rf_get(redfish_path)
376
377             # session invalid
378             print(type(redfish_response))
379             print("here we are 2")
380             if redfish_response.status == 401:
bb-Ricardo commented 2 years ago

Thank you for testing. Can you change following:

except IncompleteRead as e:

to

except Exception as e:

and test again? Thank you

Fogelholk commented 2 years ago
# env/lib/python3.8/site-packages/redfish/rest/v1.py
...
 167         if http_response is not None:
 168
 169             # mitigate issues with IncompleteRead
 170             try:
 171                 self._read = http_response.content
 172             except Exception as e:
 173                 LOGGER.warning(e)
 174                 pass
 175             self._status = http_response.status_code
...

Still get the exact same output :(

/redfish/v1/Chassis/System.Embedded.1
here we are
3.1.6
2022-06-09 09:09:10,094 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-09 09:09:10,094 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-09 09:09:10,096 - DEBUG: Starting new HTTPS connection (1): s851.drac.loopia.se:443
2022-06-09 09:09:10,652 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
2022-06-09 09:09:10,653 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
bb-Ricardo commented 2 years ago

That's not good.

Up for another session? I have it open already.

bb-Ricardo commented 2 years ago

I pushed another change to the library. https://github.com/bb-Ricardo/python-redfish-library/commit/c54fd864d58d27ab515d7bac57b14f31eb013e08

can you change it accordingly.

also in check_redfish the code needs to be changed a bit:

diff --git a/cr_module/classes/redfish.py b/cr_module/classes/redfish.py
index 5e2de15..46f7e83 100644
--- a/cr_module/classes/redfish.py
+++ b/cr_module/classes/redfish.py
@@ -377,7 +377,9 @@ class RedfishConnection:
                 if self.username is None or self.password is None:
                     self.exit_on_error(f"Username and Password needed to connect to this BMC")

-            if redfish_response.status != 404 and redfish_response.status >= 400 and self.session_was_restored is True:
+            if (redfish_response.status is None or
+                (redfish_response.status != 404 and redfish_response.status >= 400)) \
+                    and self.session_was_restored is True:
                 # reset connection
                 self.init_connection(reset=True)

and test it again? thank you

Fogelholk commented 2 years ago

Added the changes in v1.py, as well as a new print just because

 871             except (IncompleteRead, InvalidChunkLength) as e:
 872                 print("Invalid chunks")
 873                 restresp = RestResponse(restreq, None)
 874                 LOGGER.error('Invalid response from %s [%s]'% (path, excp))
 875             except Exception as excp:
 876                 print("now we are here instead")

Did the changes in cr_module/classes/redfish.py but still get the following:

/redfish/v1/Chassis/System.Embedded.1
here we are
3.1.6
2022-06-09 11:18:22,796 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-09 11:18:22,796 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-09 11:18:22,798 - DEBUG: Starting new HTTPS connection (1): FQDN:443
2022-06-09 11:18:23,296 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
now we are here instead
2022-06-09 11:18:23,297 - INFO: Retrying /redfish/v1/Chassis/System.Embedded.1 [("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))]
bb-Ricardo commented 2 years ago

Hi @Fogelholk,

can you comment out the last exception block in v1.py starting at line 875 and post the "result" here. Maybe then I can see which exception is actually raised which then can be fetched.

Thank you.

Fogelholk commented 2 years ago

Sure thing, here it is

/redfish/v1/Chassis/System.Embedded.1
here we are
3.1.6
2022-06-10 15:27:05,035 - DEBUG: HTTP REQUEST: GET
        PATH: /redfish/v1/Chassis/System.Embedded.1
        BODY: None
2022-06-10 15:27:05,035 - INFO: Attempt 1 of /redfish/v1/Chassis/System.Embedded.1
2022-06-10 15:27:05,037 - DEBUG: Starting new HTTPS connection (1): FQDN:443
2022-06-10 15:27:07,203 - DEBUG: https://FQDN:443 "GET /redfish/v1/Chassis/System.Embedded.1 HTTP/1.1" 401 None
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 697, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 438, in _error_catcher
    yield
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 764, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 701, in _update_chunk_length
    raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 760, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 572, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 793, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 455, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/libexec/monitoring/check_redfish2/check_redfish.py", line 166, in <module>
    if any(x in args.requested_query for x in ['temp', 'all']):     get_chassi_data(plugin, Temperature)
  File "/usr/local/libexec/monitoring/check_redfish2/src/cr_module/system_chassi.py", line 43, in get_chassi_data
    chassi_data = plugin_object.rf.get_view(chassi_url)
  File "/usr/local/libexec/monitoring/check_redfish2/src/cr_module/classes/redfish.py", line 513, in get_view
    return self.get(redfish_path)
  File "/usr/local/libexec/monitoring/check_redfish2/src/cr_module/classes/redfish.py", line 375, in get
    redfish_response = self._rf_get(redfish_path)
  File "/usr/local/libexec/monitoring/check_redfish2/src/cr_module/classes/redfish.py", line 364, in _rf_get
    return self.connection.get(redfish_path, None)
  File "/usr/local/libexec/monitoring/check_redfish2/env/lib/python3.8/site-packages/redfish/rest/v1.py", line 616, in get
    return self._rest_request(path, method='GET', args=args,
  File "/usr/local/libexec/monitoring/check_redfish2/env/lib/python3.8/site-packages/redfish/rest/v1.py", line 1046, in _rest_request
    return super(HttpClient, self)._rest_request(path=path, method=method,
  File "/usr/local/libexec/monitoring/check_redfish2/env/lib/python3.8/site-packages/redfish/rest/v1.py", line 859, in _rest_request
    resp = self._session.request(method.upper(), "{}{}".format(self.__base_url, reqpath), data=body,
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 687, in send
    r.content
  File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 838, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 763, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
bb-Ricardo commented 2 years ago

Hi,

I pushed another commit where just the exception import got changed: https://github.com/bb-Ricardo/python-redfish-library/commit/2656cbcecbb6ae4b1e3e38b49f2bd1310e7dd525

Can you try this out, please.

Thank you for your patience.

Fogelholk commented 2 years ago

That change seems to produce the exact same errors as before, should I have expected some new output?

bb-Ricardo commented 2 years ago

yeah, kind of, it should have fixed the issue 😅

it seems really difficult to debug this issue without access to an iDRAC where this is happening. And from reading into it, it seems that the iDRAC web server violates the HTTP standard and that's why we run into this issue.

can you run following command:

curl -i -H "Accept: application/json" -H "X-Auth-Token: 12345" https://idrac-ip/redfish/v1/Chassis/System.Embedded.1

maybe we can see the issue there and request an actual fi from Dell.

Fogelholk commented 2 years ago

Ah haha I see, yeah this only seems to occur on a few of our servers where sessions aren't refreshed properly. Tried curl with both a working and a non-working auth-token:

# curl -k -i -H "Accept: application/json" -H "X-Auth-Token: EXPIRED_TOKEN" https://FQDN/redfish/v1/Chassis/System.Embedded.1
HTTP/1.1 401 Unauthorized
Strict-Transport-Security: max-age=63072000
Vary: Accept-Encoding
Keep-Alive: timeout=60, max=199
X-Frame-Options: SAMEORIGIN
Content-Type: text/html
Date: Mon, 13 Jun 2022 07:53:22 GMT
Connection: Keep-Alive
Transfer-Encoding: chunked
WWW-Authenticate: Basic realm="RedfishService"
Accept-Ranges: bytes
curl: (18) transfer closed with outstanding read data remaining

# curl -k -i -H "Accept: application/json" -H "X-Auth-Token: WORKING_TOKEN" https://FQDN/redfish/v1/Chassis/System.Embedded.1
HTTP/1.1 200 OK
Strict-Transport-Security: max-age=63072000
OData-Version: 4.0
Vary: Accept-Encoding
Keep-Alive: timeout=60, max=199
X-Frame-Options: SAMEORIGIN
Content-Type: application/json;odata.metadata=minimal;charset=utf-8
Server: iDRAC/8
Date: Mon, 13 Jun 2022 07:54:31 GMT
Link: </redfish/v1/Schemas/Chassis.v1_6_0.json>;rel=describedby
Cache-Control: no-cache
Content-Length: 3402
Allow: POST,PATCH
Connection: Keep-Alive
Access-Control-Allow-Origin: *
Accept-Ranges: bytes

{"@odata.context":"/redfish/v1/$metadata#Chassis.Chassis","@odata.id":"/redfish/v1/Chassis/System.Embedded.1","@odata.type":"#Chassis.v1_6_0.Chassis","Actions":{"#Chassis.Reset":{"ResetType@Redfish.AllowableValues":["On","ForceOff"],"target":"/redfish/v1/Chassis/System.Embedded.1/Actions/Chassis.Reset"}},"Assembly":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Assembly"},"AssetTag":"","ChassisType":"Rack","Description":"It represents the properties for physical components for any system.It represent racks, rackmount servers, blades, standalone, modular systems,enclosures, and all other containers.The non-cpu/device centric parts of the schema are all accessed either directly or indirectly through this resource.","Id":"System.Embedded.1","IndicatorLED":"Off","Links":{"ComputerSystems":[{"@odata.id":"/redfish/v1/Systems/System.Embedded.1"}],"ComputerSystems@odata.count":1,"Contains":[{"@odata.id":"/redfish/v1/Chassis/Enclosure.Internal.0-1:RAID.Integrated.1-1"}],"Contains@odata.count":1,"CooledBy":[{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.1A"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.1B"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.2A"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.2B"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.3A"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.3B"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.4A"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.4B"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.5A"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.5B"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.6A"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Sensors/Fans/0x17%7C%7CFan.Embedded.6B"}],"CooledBy@odata.count":12,"Drives":[],"Drives@odata.count":0,"ManagedBy":[{"@odata.id":"/redfish/v1/Managers/iDRAC.Embedded.1"}],"ManagedBy@odata.count":1,"ManagersInChassis":[{"@odata.id":"/redfish/v1/Managers/iDRAC.Embedded.1"}],"ManagersInChassis@odata.count":1,"PCIeDevices":[],"PCIeDevices@odata.count":0,"PoweredBy":[{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/PSU.Slot.1"},{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/PSU.Slot.2"}],"PoweredBy@odata.count":2,"Storage":[{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/Storage/AHCI.Embedded.1-1"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/Storage/AHCI.Embedded.2-1"}],"Storage@odata.count":3},"Manufacturer":"Dell Inc.","Model":"PowerEdge R430","Name":"Computer System Chassis","NetworkAdapters":{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/NetworkAdapters"},"PartNumber":"03XKDVA02","Power":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Power"},"PowerState":"On","SKU":"XXXXXXX","SerialNumber":"XXXXXXXXXXXXXX","Status":{"Health":"OK","HealthRollup":"OK","State":"Enabled"},"Thermal":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/Thermal"}}
bb-Ricardo commented 2 years ago

So strange, the response header looks quite different and a few headers are completely missing, like "Server".

This is how it should look like: https://en.wikipedia.org/wiki/Chunked_transfer_encoding

The Webserver just says: oh I will send you the data in chunks and then just closes the connection without sending anything.

DO these server have a different iDRAC version?

Fogelholk commented 2 years ago

Sounds like iDracs being iDracs :D

I patched all iDracs to their respective latest version about 2-3 weeks ago. This machine which we have troubleshooted with is running 2.83.83.83 (PowerEdge R430).

bb-Ricardo commented 1 year ago

Closing this issue due to inactivity. Please feel free to reopen if there are any new developments.