argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.77k stars 354 forks source link

[BUG-python/deployment] Huggingface Space crashed #5167

Open fangguo1 opened 1 month ago

fangguo1 commented 1 month ago

Same as #4689, I met the runtime error issue. According to the solution in #4689, I switched to the newest v1.29.0 for the defined Dockerfile, but it did not resolve my problem when I ran the 'Factory rebuild'.

To be specific, when I tried to restart/rebuild my space, the hf space kept giving me ' Runtime error 46 system | sending SIGKILL to elastic.1 (pid 10) 07:16:46 system | sending SIGKILL to elastic.1 (pid 10) 07:16:47 system | sending SIGKILL to elastic.1 (pid 10) 07:16:47 system | sending SIGKILL to elastic.1 (pid 10) ... ' And the error log is ' Traceback (most recent call last): 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/opensearchpy/connection/http_urllib3.py", line 240, in perform_request 07:16:16 argilla.1 | response = self.pool.urlopen( 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen 07:16:16 argilla.1 | retries = retries.increment( 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 525, in increment 07:16:16 argilla.1 | raise six.reraise(type(error), error, _stacktrace) 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise 07:16:16 argilla.1 | raise value 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen 07:16:16 argilla.1 | httplib_response = self._make_request( 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 416, in _make_request 07:16:16 argilla.1 | conn.request(method, url, httplib_request_kw) 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 244, in request 07:16:16 argilla.1 | super(HTTPConnection, self).request(method, url, body=body, headers=headers) 07:16:16 argilla.1 | File "/usr/local/lib/python3.10/http/client.py", line 1283, in request 07:16:16 argilla.1 | self._send_request(method, url, body, headers, encode_chunked) 07:16:16 argilla.1 | File "/usr/local/lib/python3.10/http/client.py", line 1329, in _send_request 07:16:16 argilla.1 | self.endheaders(body, encode_chunked=encode_chunked) 07:16:16 argilla.1 | File "/usr/local/lib/python3.10/http/client.py", line 1278, in endheaders 07:16:16 argilla.1 | self._send_output(message_body, encode_chunked=encode_chunked) 07:16:16 argilla.1 | File "/usr/local/lib/python3.10/http/client.py", line 1038, in _send_output 07:16:16 argilla.1 | self.send(msg) 07:16:16 argilla.1 | File "/usr/local/lib/python3.10/http/client.py", line 976, in send 07:16:16 argilla.1 | self.connect() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect 07:16:16 argilla.1 | conn = self._new_conn() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn 07:16:16 argilla.1 | raise NewConnectionError( 07:16:16 argilla.1 | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f7132096aa0>: Failed to establish a new connection: [Errno 111] Connection refused 07:16:16 argilla.1 | 07:16:16 argilla.1 | During handling of the above exception, another exception occurred: 07:16:16 argilla.1 | 07:16:16 argilla.1 | Traceback (most recent call last): 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/daos/backend/client_adapters/factory.py", line 74, in _fetch_cluster_version_info 07:16:16 argilla.1 | data = client.info() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/opensearchpy/client/utils.py", line 178, in _wrapped 07:16:16 argilla.1 | return func(*args, params=params, headers=headers, *kwargs) 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/opensearchpy/client/init.py", line 251, in info 07:16:16 argilla.1 | return self.transport.perform_request( 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/opensearchpy/transport.py", line 406, in perform_request 07:16:16 argilla.1 | raise e 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/opensearchpy/transport.py", line 369, in perform_request 07:16:16 argilla.1 | status, headers_response, data = connection.perform_request( 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/opensearchpy/connection/http_urllib3.py", line 255, in perform_request 07:16:16 argilla.1 | raise ConnectionError("N/A", str(e), e) 07:16:16 argilla.1 | opensearchpy.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f7132096aa0>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f7132096aa0>: Failed to establish a new connection: [Errno 111] Connection refused) 07:16:16 argilla.1 | 07:16:16 argilla.1 | During handling of the above exception, another exception occurred: 07:16:16 argilla.1 | 07:16:16 argilla.1 | Traceback (most recent call last): 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/_app.py", line 170, in _setup_elasticsearch 07:16:16 argilla.1 | backend = GenericElasticEngineBackend.get_instance() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/daos/backend/generic_elastic.py", line 82, in get_instance 07:16:16 argilla.1 | client=ClientAdapterFactory.get( 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/daos/backend/client_adapters/factory.py", line 46, in get 07:16:16 argilla.1 | version, distribution = cls._fetch_cluster_version_info(client_config) 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/daos/backend/client_adapters/factory.py", line 82, in _fetch_cluster_version_info 07:16:16 argilla.1 | raise GenericSearchError(error) 07:16:16 argilla.1 | argilla_server.daos.backend.base.GenericSearchError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f7132096aa0>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f7132096aa0>: Failed to establish a new connection: [Errno 111] Connection refused) 07:16:16 argilla.1 | 07:16:16 argilla.1 | The above exception was the direct cause of the following exception: 07:16:16 argilla.1 | 07:16:16 argilla.1 | Traceback (most recent call last): 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/starlette/routing.py", line 732, in lifespan 07:16:16 argilla.1 | async with self.lifespan_context(app) as maybe_state: 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/starlette/routing.py", line 608, in aenter 07:16:16 argilla.1 | await self._router.startup() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/starlette/routing.py", line 709, in startup 07:16:16 argilla.1 | await handler() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/_app.py", line 189, in setup_elasticsearch 07:16:16 argilla.1 | _setup_elasticsearch() 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/backoff/_sync.py", line 94, in retry 07:16:16 argilla.1 | ret = target(args, kwargs) 07:16:16 argilla.1 | File "/opt/venv/lib/python3.10/site-packages/argilla_server/_app.py", line 179, in _setup_elasticsearch 07:16:16 argilla.1 | raise ConfigError( 07:16:16 argilla.1 | pydantic.errors.ConfigError: Your Elasticsearch endpoint at http://localhost:9200 is not available or not responding. 07:16:16 argilla.1 | Please make sure your Elasticsearch instance is launched and correctly running and 07:16:16 argilla.1 | you have the necessary access permissions. Once you have verified this, restart the argilla server. 07:16:16 argilla.1 | 07:16:16 argilla.1 | 07:16:16 argilla.1 | ERROR: Application startup failed. Exiting. '

Thanks

davidberenstein1957 commented 1 month ago

@jfcalvo @frascuchon might be able to help here.

@fangguo1 I believe that @jfcalvo already extended the timeout for the ES connection which should work to resolve this issue partially.