GluuFederation / cloud-native-edition

Cloud Native Edition repository
https://gluu.org/docs/gluu-server/latest/installation-guide/install-kubernetes/
Apache License 2.0
33 stars 25 forks source link

fix: oxshibboleth loses connection to JR upon scaling #620

Closed misba7 closed 5 months ago

misba7 commented 8 months ago

Describe the bug oxshibboleth loses connection to JR upon scaling

To Reproduce Steps to reproduce the behavior:

  1. rollout restart shib
  2. scale shib, oxtrust, and JR to 2 replicas
  3. scale JR back to 1
  4. oxshibbo-0 logs:
    
    During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.10/site-packages/webdav3/client.py", line 66, in _wrapper res = fn(self, *args, kw) File "/usr/lib/python3.10/site-packages/webdav3/client.py", line 292, in check response = self.execute_request(action='check', path=urn.quote()) File "/usr/lib/python3.10/site-packages/webdav3/client.py", line 208, in execute_request response = self.session.request( File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3.10/site-packages/requests/adapters.py", line 532, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPConnectionPool(host='jackrabbit', port=8080): Read timed out. (read timeout=30)



oxshibo-0 -- px aux: 
`python3 /app/scripts/document_sync.py` is killed 
iromli commented 8 months ago

The issue is not upon scaling, but when one of shib pods connected to specific jackrabbit pod that is unreachable, the script for document synchronization is killed due to uncaught error.

iromli commented 8 months ago

@misba7 can you re-test using gluufederation/oxshibboleth:4.5.3-5 image?