OpenConceptLab / ocl_issues

Issues for all OCL repos. NOTE: Install ZenHub Browser Extension and request access to the OCL Roadmap board to view all issues and to contribute
4 stars 2 forks source link

Fix Max retries exceeded with url: /_nodes/_all/http #1943

Closed rkorytkowski closed 1 month ago

rkorytkowski commented 1 month ago

Fix errbit: HTTPConnectionPool(host='10.1.11.234', port=9200): Max retries exceeded with url: /_nodes/_all/http (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff7236951e0>, 'Connection to 10.1.11.234 timed out. (connect timeout=10)'))

Backtrace: /usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py:166→ perform_request: response = self.session.send(prepared_request, send_kwargs) /usr/local/lib/python3.10/site-packages/requests/sessions.py:703→ send: r = adapter.send(request, kwargs) /usr/local/lib/python3.10/site-packages/requests/adapters.py:507→ send: raise ConnectTimeout(e, request=request)

It might be an ES cluster configuration issue. Basically the sniffer requests all nodes by an IP address of a node, which is no longer present. It should be using a hostname.

rkorytkowski commented 1 month ago

Adjusted the setup to publish hostnames instead of IPs. Deployed to staging and production. Fixed geoip issue in ES logs. We'll see if errbits come up again.

rkorytkowski commented 1 month ago

No more issues came up once clients were restarted (celery-indexing on staging/production and celery-production), which means the change is working. If it happens for any other client, then it simply needs to be restarted to pick up changes.