Closed shiertier closed 1 week ago
Encounter the following issues when using:
enova pilot run --model /usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
/usr/local/lib/python3.10/dist-packages/pytools/persistent_dict.py:63: RecommendedHashNotFoundWarning: Unable to import recommended hash 'siphash24.siphash13', falling back to 'hashlib.sha256'. Run 'python3 -m pip install siphash24' to install the recommended hash. warn("Unable to import recommended hash 'siphash24.siphash13', " [2024-10-06 Sun 18:45:33.989][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova {'serving_host': '0.0.0.0', 'serving_port': 9199, 'backend': 'vllm', 'webui_host': '0.0.0.0', 'webui_port': 8501, 'exporter_endpoint': 'otel-collector:4317', 'exporter_service_name': 'llmo-svc', 'enova_app_host': '0.0.0.0', 'enova_app_port': 8182, 'hf_proxy': None, 'restart_service': None, 'model': '/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'call_order': 1, 'command_func': 'enova_pilot_run_args'} [2024-10-06 Sun 18:45:34.21][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up nginx -d [2024-10-06 Sun 18:45:34.179][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:131 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|nginx start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'nginx', '-d']' returned non-zero exit status 1. [2024-10-06 Sun 18:45:34.179][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova {'serving_host': '0.0.0.0', 'serving_port': 9199, 'backend': 'vllm', 'webui_host': '0.0.0.0', 'webui_port': 8501, 'exporter_endpoint': 'otel-collector:4317', 'exporter_service_name': 'llmo-svc', 'enova_app_host': '0.0.0.0', 'enova_app_port': 8182, 'hf_proxy': None, 'restart_service': None, 'model': '/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'call_order': 1, 'command_func': 'enova_pilot_run_args'} [2024-10-06 Sun 18:45:34.234][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up webui-nginx -d [2024-10-06 Sun 18:45:34.388][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:137 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|webui_nginx start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'webui-nginx', '-d']' returned non-zero exit status 1. [2024-10-06 Sun 18:45:34.389][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova [2024-10-06 Sun 18:45:34.443][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up dcgm-exporter -d [2024-10-06 Sun 18:45:34.610][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:144 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|monitor start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'dcgm-exporter', '-d']' returned non-zero exit status 1. [2024-10-06 Sun 18:45:34.610][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/utils.py:68 - get_pkg_namespace_path()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|module_path: /usr/local/lib/python3.10/dist-packages/enova [2024-10-06 Sun 18:45:34.664][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/app.py:65 - _run_compose()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|use local model: /usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 [2024-10-06 Sun 18:45:34.664][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/app.py:66 - _run_compose()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|maping local model in compose: /workspace/model [2024-10-06 Sun 18:45:34.664][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:130 - update_service_options()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|updated service config: {'image': 'emergingai/enova:v0.0.5', 'container_name': 'enova-app', 'command': 'enova app run --host 0.0.0.0 --port 8182', 'volumes': ['/run/docker.sock:/run/docker.sock', '/root/.cache/huggingface:/root/.cache/huggingface', '/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4:/workspace/model', '/tmp/traffic_injector:/tmp/traffic_injector', 'traffic_injector_dataset:/opt/enova/enova/template/deployment/docker-compose/traffic-injector/data'], 'deploy': {'resources': {'reservations': {'devices': [{'driver': 'nvidia', 'count': 'all', 'capabilities': ['gpu']}]}}}, 'ports': ['8182:8182'], 'depends_on': ['enova-escaler'], 'networks': {'enova-net': {'aliases': ['enova-app']}}, 'environment': ['EMERGINGAI_ENOVA_APP_HOST_MODEL_DIR=/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'EMERGINGAI_ENOVA_APP_USER_ARGS={"serving_host": "0.0.0.0", "serving_port": 9199, "backend": "vllm", "webui_host": "0.0.0.0", "webui_port": 8501, "exporter_endpoint": "otel-collector:4317", "exporter_service_name": "llmo-svc", "enova_app_host": "0.0.0.0", "enova_app_port": 8182, "hf_proxy": null, "restart_service": null, "model": "/usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4", "call_order": 1, "command_func": "enova_pilot_run_args"}']} [2024-10-06 Sun 18:45:34.664][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:132 - update_service_options()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|updated compose file: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml [2024-10-06 Sun 18:45:34.678][DEBUG][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/common/cli_helper.py:107 - _run_command()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|Command: /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64 -f /usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml up enova-app -d [2024-10-06 Sun 18:45:34.839][WARNING][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py:150 - run()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|enova_app start failed: err: Command '['/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/bin/docker-compose-linux-x86_64', '-f', '/usr/local/lib/python3.10/dist-packages/enova/template/deployment/docker-compose/enova_compose.yaml', 'up', 'enova-app', '-d']' returned non-zero exit status 1. [2024-10-06 Sun 18:45:34.842][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:35 - pools()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|create ASyncClientPool pools [2024-10-06 Sun 18:45:34.850][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:35.870][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:36.875][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:37.879][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:38.884][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:39.889][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:40.893][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:41.899][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:42.903][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} [2024-10-06 Sun 18:45:43.908][INFO][MainProcess][732][MainThread][server][/usr/local/lib/python3.10/dist-packages/enova/api/base.py:117 - __call__()] [trace_id: e595d856ab3341eebf081abb8ea8bb06]|method: get, actual_url: http://127.0.0.1:8182/v1/healthz, params: None, headers: {'trace_id': 'e595d856ab3341eebf081abb8ea8bb06'} Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 165, in try_connect stream = await asynclib.connect_tcp(remote_host, remote_port, local_address) File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2301, in connect_tcp await get_running_loop().create_connection( File "/usr/lib/python3.10/asyncio/base_events.py", line 1076, in create_connection raise exceptions[0] File "/usr/lib/python3.10/asyncio/base_events.py", line 1060, in create_connection sock = await self._connect_sock( File "/usr/lib/python3.10/asyncio/base_events.py", line 969, in _connect_sock await self.sock_connect(sock, address) File "/usr/lib/python3.10/asyncio/selector_events.py", line 501, in sock_connect return await fut File "/usr/lib/python3.10/asyncio/selector_events.py", line 541, in _sock_connect_cb raise OSError(err, f'Connect call failed {address}') ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8182) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions yield File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp stream: anyio.abc.ByteStream = await anyio.connect_tcp( File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 227, in connect_tcp raise OSError("All connection attempts failed") from cause OSError: All connection attempts failed The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions yield File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py", line 262, in handle_async_request raise exc File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request response = await connection.handle_async_request(request) File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection.py", line 92, in handle_async_request raise exc File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection.py", line 69, in handle_async_request stream = await self._connect(request) File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection.py", line 117, in _connect stream = await self._network_backend.connect_tcp(**kwargs) File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/auto.py", line 31, in connect_tcp return await self._backend.connect_tcp( File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/anyio.py", line 112, in connect_tcp with map_exceptions(exc_map): File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ConnectError: All connection attempts failed The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/enova", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/enova/entry/cli.py", line 42, in main cli() File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 92, in new_func return ctx.invoke(f, obj, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py", line 335, in pilot_run enova_pilot.run( File "/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py", line 217, in run raise e File "/usr/local/lib/python3.10/dist-packages/enova/entry/command/pilot.py", line 209, in run healthz_res = cli_loop.run_until_complete(EnovaAppApi.healthz(params={})) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/usr/local/lib/python3.10/dist-packages/enova/api/base.py", line 120, in __call__ response = await async_client.request( File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1530, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1617, in send response = await self._send_handling_auth( File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1645, in _send_handling_auth response = await self._send_handling_redirects( File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1682, in _send_handling_redirects response = await self._send_single_request(request) File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1719, in _send_single_request response = await transport.handle_async_request(request) File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 352, in handle_async_request with map_httpcore_exceptions(): File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ConnectError: All connection attempts failed
I just mentioned that it is a container issue.
Encounter the following issues when using:
enova pilot run --model /usrdata/llm/Meta-Llama-3.1-8B-Instruct-AWQ-INT4