allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
381 stars 132 forks source link

clearml-apiserver exited because of connection refused by redis #76

Closed ruoyush closed 3 years ago

ruoyush commented 3 years ago

Hello, I'm trying to depoly clearml by using docker-compose on my linux server(ubuntu). After I docker-compose up the containers, the clearml-apiserver cannot be started (keep restarting). Here is the error log.

clearml-apiserver | Loading config from /opt/trains/apiserver/config/default
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/secure.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/apiserver.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/logging.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/hosts.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/services/organization.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/services/auth.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/services/tasks.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/services/projects.conf
clearml-apiserver | Loading config from file /opt/trains/apiserver/config/default/services/events.conf
clearml-apiserver | Loading config from /opt/trains/config
clearml-apiserver | Traceback (most recent call last):
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 559, in connect
clearml-apiserver |     sock = self._connect()
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 615, in _connect
clearml-apiserver |     raise err
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 603, in _connect
clearml-apiserver |     sock.connect(socket_address)
clearml-apiserver | ConnectionRefusedError: [Errno 111] Connection refused
clearml-apiserver |
clearml-apiserver | During handling of the above exception, another exception occurred:
clearml-apiserver |
clearml-apiserver | Traceback (most recent call last):
clearml-apiserver |   File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
clearml-apiserver |     "__main__", mod_spec)
clearml-apiserver |   File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
clearml-apiserver |     exec(code, run_globals)
clearml-apiserver |   File "/opt/trains/apiserver/server.py", line 6, in <module>
clearml-apiserver |     from apiserver.server_init.app_sequence import AppSequence
clearml-apiserver |   File "/opt/trains/apiserver/server_init/app_sequence.py", line 10, in <module>
clearml-apiserver |     from apiserver.bll.statistics.stats_reporter import StatisticsReporter
clearml-apiserver |   File "/opt/trains/apiserver/bll/statistics/stats_reporter.py", line 30, in <module>
clearml-apiserver |     worker_bll = WorkerBLL()
clearml-apiserver |   File "/opt/trains/apiserver/bll/workers/__init__.py", line 38, in __init__
clearml-apiserver |     self.redis = redis or redman.connection("workers")
clearml-apiserver |   File "/opt/trains/apiserver/redis_manager.py", line 176, in connection
clearml-apiserver |     obj.get("health")
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1606, in get
clearml-apiserver |     return self.execute_command('GET', name)
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 898, in execute_command
clearml-apiserver |     conn = self.connection or pool.get_connection(command_name, **options)
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 1192, in get_connection
clearml-apiserver |     connection.connect()
clearml-apiserver |   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 563, in connect
clearml-apiserver |     raise ConnectionError(self._error_message(e))
clearml-apiserver | redis.exceptions.ConnectionError: Error 111 connecting to 127.0.0.1:6379. Connection refused.

However, the redis server is started normally. Here is the docker ps output

c62f65cc01ec   allegroai/clearml-agent-services:latest   "/usr/agent/entrypoi…"   19 seconds ago   Up 17 seconds
    clearml-agent-services
76518b1d86aa   allegroai/clearml:latest                  "/opt/trains/wrapper…"   19 seconds ago   Up 17 seconds                  8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tc p   clearml-webserver
488f1631fe63   allegroai/clearml:latest                  "/opt/trains/wrapper…"   21 seconds ago   Restarting (1) 2 seconds ago
    clearml-apiserver
eeee6210c01e   redis:5.0                                 "docker-entrypoint.s…"   25 seconds ago   Up 23 seconds                  6379/tcp
    clearml-redis
c96ecca58118   allegroai/clearml:latest                  "/opt/trains/wrapper…"   25 seconds ago   Up 20 seconds                  8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp
    clearml-fileserver
da986d3a2543   mongo:3.6.5                               "docker-entrypoint.s…"   25 seconds ago   Up 20 seconds                  27017/tcp
    clearml-mongo
be3cb057f88d   elasticsearch:7.6.2                       "/usr/local/bin/dock…"   25 seconds ago   Up 22 seconds                  9200/tcp, 9300/tcp
    clearml-elastic

The instructions I followed is here (https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html)

Please help, Thank you!

jkhenning commented 3 years ago

Hi @ruoyush,

It looks like the docker containers you're using are ClearML Server 0.17.0 version. Since 1.0.1 was already released, I would suggest to use the latest docker-compose.yml from the GitHub repository and to update the images using docker-compose pull.

If you are using the latest docker-compose.yml, it might be that you've tried to install ClearML Server in the past, and have the docker images cached locally (in which case this is actually more of an "upgrade" than a fresh install).

ruoyush commented 3 years ago

Hi @ruoyush,

It looks like the docker containers you're using are ClearML Server 0.17.0 version. Since 1.0.1 was already released, I would suggest to use the latest docker-compose.yml from the GitHub repository and to update the images using docker-compose pull.

If you are using the latest docker-compose.yml, it might be that you've tried to install ClearML Server in the past, and have the docker images cached locally (in which case this is actually more of an "upgrade" than a fresh install).

Thank you. I pulled the docker image through a mirror server. Probably the image on the mirror is the old version. I will try pull the image on dockerhub.

ruoyush commented 3 years ago

After I pulled the latest image from dockerhub, the problem was sovled. Thanks~ @jkhenning