Closed rkorytkowski closed 1 month ago
Adjusted the setup to publish hostnames instead of IPs. Deployed to staging and production. Fixed geoip issue in ES logs. We'll see if errbits come up again.
No more issues came up once clients were restarted (celery-indexing on staging/production and celery-production), which means the change is working. If it happens for any other client, then it simply needs to be restarted to pick up changes.
Fix errbit: HTTPConnectionPool(host='10.1.11.234', port=9200): Max retries exceeded with url: /_nodes/_all/http (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff7236951e0>, 'Connection to 10.1.11.234 timed out. (connect timeout=10)'))
Backtrace: /usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py:166→ perform_request: response = self.session.send(prepared_request, send_kwargs) /usr/local/lib/python3.10/site-packages/requests/sessions.py:703→ send: r = adapter.send(request, kwargs) /usr/local/lib/python3.10/site-packages/requests/adapters.py:507→ send: raise ConnectTimeout(e, request=request)
It might be an ES cluster configuration issue. Basically the sniffer requests all nodes by an IP address of a node, which is no longer present. It should be using a hostname.