Fix Max retries exceeded with url: /_nodes/_all/http

rkorytkowski commented 1 month ago

Fix errbit: HTTPConnectionPool(host='10.1.11.234', port=9200): Max retries exceeded with url: /_nodes/_all/http (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff7236951e0>, 'Connection to 10.1.11.234 timed out. (connect timeout=10)'))

Backtrace: /usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py:166→ perform_request: response = self.session.send(prepared_request, send_kwargs) /usr/local/lib/python3.10/site-packages/requests/sessions.py:703→ send: r = adapter.send(request, kwargs) /usr/local/lib/python3.10/site-packages/requests/adapters.py:507→ send: raise ConnectTimeout(e, request=request)

It might be an ES cluster configuration issue. Basically the sniffer requests all nodes by an IP address of a node, which is no longer present. It should be using a hostname.

rkorytkowski commented 1 month ago

Adjusted the setup to publish hostnames instead of IPs. Deployed to staging and production. Fixed geoip issue in ES logs. We'll see if errbits come up again.

rkorytkowski commented 1 month ago

No more issues came up once clients were restarted (celery-indexing on staging/production and celery-production), which means the change is working. If it happens for any other client, then it simply needs to be restarted to pick up changes.

OpenConceptLab / ocl_issues

Fix Max retries exceeded with url: /_nodes/_all/http #1943