Emerging-AI / ENOVA

A deployment, monitoring and autoscaling service towards serverless LLM serving.
Apache License 2.0
158 stars 27 forks source link

ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8182) #35

Closed shiertier closed 1 week ago

shiertier commented 1 week ago

Encounter the following issues when using:

enova pilot run --model /usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

/usr/local/lib/python3.10/dist-packages/pytools/persistent_dict.py:63: RecommendedHashNotFoundWarning: Unable to import recommended hash 'siphash24.siphash13', falling back to 'hashlib.sha256'. Run 'python3 -m pip install siphash24' to install the recommended hash.
  warn("Unable to import recommended hash 'siphash24.siphash13', "
[2024-10-06 Sun 18:45:33.989][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova
{'serving_host': '0.0.0.0', 'serving_port': 9199, 'backend': 'vllm', 'webui_host': '0.0.0.0', 'webui_port': 8501, 'exporter_endpoint': 'otel-collector:4317', 'exporter_service_name': 'llmo-svc', 'enova_app_host': '0.0.0.0', 'enova_app_port': 8182, 'hf_proxy': None, 'restart_service': None, 'model': '/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'call_order': 1, 'command_func': 'enova_pilot_run_args'}
[2024-10-06 Sun 18:45:34.21][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up nginx -d
[2024-10-06 Sun 18:45:34.179][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:131 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|nginx start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'nginx', '-d']' returned non-zero exit status 1.
[2024-10-06 Sun 18:45:34.179][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova
{'serving_host': '0.0.0.0', 'serving_port': 9199, 'backend': 'vllm', 'webui_host': '0.0.0.0', 'webui_port': 8501, 'exporter_endpoint': 'otel-collector:4317', 'exporter_service_name': 'llmo-svc', 'enova_app_host': '0.0.0.0', 'enova_app_port': 8182, 'hf_proxy': None, 'restart_service': None, 'model': '/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'call_order': 1, 'command_func': 'enova_pilot_run_args'}
[2024-10-06 Sun 18:45:34.234][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up webui-nginx -d
[2024-10-06 Sun 18:45:34.388][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:137 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|webui_nginx start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'webui-nginx', '-d']' returned non-zero exit status 1.
[2024-10-06 Sun 18:45:34.389][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova
[2024-10-06 Sun 18:45:34.443][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up dcgm-exporter -d
[2024-10-06 Sun 18:45:34.610][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:144 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|monitor start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'dcgm-exporter', '-d']' returned non-zero exit status 1.
[2024-10-06 Sun 18:45:34.610][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova
[2024-10-06 Sun 18:45:34.664][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/app.py:65 - _run_compose()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|use local model: /usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
[2024-10-06 Sun 18:45:34.664][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/app.py:66 - _run_compose()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|maping local model in compose: /workspace/model
[2024-10-06 Sun 18:45:34.664][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:130 - update_service_options()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|updated service config: {'image': 'emergingai/enova:v0.0.5', 'container_name': 'enova-app', 'command': 'enova app run --host 0.0.0.0 --port 8182', 'volumes': ['/run/docker.sock:/run/docker.sock', '/root/.cache/huggingface:/root/.cache/huggingface', '/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4:/workspace/model', '/tmp/traffic_injector:/tmp/traffic_injector', 'traffic_injector_dataset:/opt/enova/enova/template/deployment/docker-compose/traffic-injector/data'], 'deploy': {'resources': {'reservations': {'devices': [{'driver': 'nvidia', 'count': 'all', 'capabilities': ['gpu']}]}}}, 'ports': ['8182:8182'], 'depends_on': ['enova-escaler'], 'networks': {'enova-net': {'aliases': ['enova-app']}}, 'environment': ['EMERGINGAI_ENOVA_APP_HOST_MODEL_DIR=/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'EMERGINGAI_ENOVA_APP_USER_ARGS={"serving_host": "0.0.0.0", "serving_port": 9199, "backend": "vllm", "webui_host": "0.0.0.0", "webui_port": 8501, "exporter_endpoint": "otel-collector:4317", "exporter_service_name": "llmo-svc", "enova_app_host": "0.0.0.0", "enova_app_port": 8182, "hf_proxy": null, "restart_service": null, "model": "/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4", "call_order": 1, "command_func": "enova_pilot_run_args"}']}
[2024-10-06 Sun 18:45:34.664][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:132 - update_service_options()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|updated compose file: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml
[2024-10-06 Sun 18:45:34.678][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up enova-app -d
[2024-10-06 Sun 18:45:34.839][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:150 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|enova_app start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'enova-app', '-d']' returned non-zero exit status 1.
[2024-10-06 Sun 18:45:34.842][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:35 - pools()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|create ASyncClientPool pools
[2024-10-06 Sun 18:45:34.850][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:35.870][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:36.875][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:37.879][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:38.884][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:39.889][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:40.893][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:41.899][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:42.903][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
[2024-10-06 Sun 18:45:43.908][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 165, in try_connect
    stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2301, in connect_tcp
    await get_running_loop().create_connection(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1076, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1060, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 969, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 501, in sock_connect
    return await fut
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 541, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8182)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
    stream: anyio.abc.ByteStream = await anyio.connect_tcp(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 227, in connect_tcp
    raise OSError("All connection attempts failed") from cause
OSError: All connection attempts failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py", line 262, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection.py", line 92, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection.py", line 69, in handle_async_request
    stream = await self._connect(request)
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection.py", line 117, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/auto.py", line 31, in connect_tcp
    return await self._backend.connect_tcp(
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/anyio.py", line 112, in connect_tcp
    with map_exceptions(exc_map):
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectError: All connection attempts failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/enova", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/enova/entry/cli.py", line 42, in main
    cli()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 92, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py", line 335, in pilot_run
    enova_pilot.run(
  File "/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py", line 217, in run
    raise e
  File "/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py", line 209, in run
    healthz_res = cli_loop.run_until_complete(EnovaAppApi.healthz(params={}))
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/enova/api/base.py", line 120, in __call__
    response = await async_client.request(
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 352, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed
shiertier commented 1 week ago

I just mentioned that it is a container issue.