Open chrisorm opened 3 years ago
Hi @chrisorm, this looks like the WebApp can't reach the server. Can you open the developer tools panel (F12), go to the network section, reload the page and share what appears in the network section list?
Hi, I am having the same problem.
In the developer tools panel, I am getting the following errors:
POST <server_address>:8080/api/v2.12/login.supported_modes 502 (Bad Gateway) zone.js:2843
POST <server_address>:8080/api/v2.12/users.get_preferences 502 (Bad Gateway) zone.js:2843
POST <server_address>:8080/api/v2.12/users.get_current_user 502 (Bad Gateway) zone.js:2843
Hi @amedyukhina,
The error (502 Bad Gateway) indicates the browser can't reach the server's API endpoints at all
Where is your server running? Are you running the web server on the same machine, or on a machine on the same network?
Facing similar issue @jkhenning,
~I'm trying to deploy the platform in my local laptop. I'm following exactly from this tutorial: https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#deploying~
~Anything i missed here? should we modify the docker-compose.yml
file? thanks in advance!~
Somehow after redeploy the server, it works well. Not sure why?
Well, it might be that the apiserver
component took some time to boot, and they UI simply could not reach it
Sorry for the delay - in my case, as it was running inside a vm, I think it was taking a long time to start - increase VM RAM and restarting fixed the issue.
Hi @jkhenning,
I was accessing from a different network via VPN. This used to work before, but I had to reinstall the clearML server after a system upgrade (from RHEL7 to RHEL8, if this is important).
I have also tried to access the server from the same machine it is running on, and I get the same error.
@amedyukhina did you try curl http://localhost:8008
from the same machine? What's the output?
It says curl: (7) Failed to connect to localhost port 8008: Connection refused
It seems like nothing is running there.
This is from the server machine? If so, it indeed indicates the server is not up.
Can you do sudo docker ps
? I assume you're using docker-compose
to run the server?
Yes, this was from the server machine. I am running the server with docker-compose.
Here is the output of sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1e11da6ad4c5 allegroai/clearml-agent-services:latest "/usr/agent/entrypoi…" 3 days ago Up 14 seconds clearml-agent-services
ade29d3480cb allegroai/clearml:latest "/opt/trains/wrapper…" 3 days ago Up 3 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
a28647d69e5f allegroai/clearml:latest "/opt/trains/wrapper…" 3 days ago Restarting (1) 19 seconds ago clearml-apiserver
34f22478e3bb docker.elastic.co/elasticsearch/elasticsearch:7.6.2 "/usr/local/bin/dock…" 3 days ago Up 3 days 9200/tcp, 9300/tcp clearml-elastic
059e88709844 redis:5.0 "docker-entrypoint.s…" 3 days ago Up 3 days 6379/tcp clearml-redis
ff3fbd91dfdf mongo:3.6.5 "docker-entrypoint.s…" 3 days ago Up 3 days 27017/tcp clearml-mongo
78870437088f allegroai/clearml:latest "/opt/trains/wrapper…" 3 days ago Up 3 days 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
So it seems your clearml-apiserver
container keeps restarting. Did you use any special configuration or made any changes to the docker-compose file?
Can you include the output of sudo docker logs clearml-apiserver
?
I have followed these instructions to install the clearML server.
I am getting a "Connection refused" error as a response to sudo docker logs clearml-apiserver
Here is the full output:
Loading config from /opt/trains/apiserver/config/default
Loading config from file /opt/trains/apiserver/config/default/apiserver.conf
Loading config from file /opt/trains/apiserver/config/default/hosts.conf
Loading config from file /opt/trains/apiserver/config/default/logging.conf
Loading config from file /opt/trains/apiserver/config/default/secure.conf
Loading config from file /opt/trains/apiserver/config/default/services/auth.conf
Loading config from file /opt/trains/apiserver/config/default/services/events.conf
Loading config from file /opt/trains/apiserver/config/default/services/organization.conf
Loading config from file /opt/trains/apiserver/config/default/services/projects.conf
Loading config from file /opt/trains/apiserver/config/default/services/tasks.conf
Loading config from /opt/trains/config
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 559, in connect
sock = self._connect()
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 615, in _connect
raise err
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 603, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/trains/apiserver/server.py", line 6, in <module>
from apiserver.server_init.app_sequence import AppSequence
File "/opt/trains/apiserver/server_init/app_sequence.py", line 10, in <module>
from apiserver.bll.statistics.stats_reporter import StatisticsReporter
File "/opt/trains/apiserver/bll/statistics/stats_reporter.py", line 30, in <module>
worker_bll = WorkerBLL()
File "/opt/trains/apiserver/bll/workers/__init__.py", line 38, in __init__
self.redis = redis or redman.connection("workers")
File "/opt/trains/apiserver/redis_manager.py", line 176, in connection
obj.get("health")
File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1606, in get
return self.execute_command('GET', name)
File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 1192, in get_connection
connection.connect()
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 127.0.0.1:6379. Connection refused.
@amedyukhina this is very strange - can you share your docker-compose.yml
file?
Here it is:
cat /opt/clearml/docker-compose.yml
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/data/fileserver:/mnt/fileserver
depends_on:
- redis
- mongo
- elasticsearch
- fileserver
environment:
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_ELASTIC_SERVICE_PORT: 9200
CLEARML_MONGODB_SERVICE_HOST: mongo
CLEARML_MONGODB_SERVICE_PORT: 27017
CLEARML_REDIS_SERVICE_HOST: redis
CLEARML_REDIS_SERVICE_PORT: 6379
CLEARML_SERVER_DEPLOYMENT_TYPE: ${CLEARML_SERVER_DEPLOYMENT_TYPE:-linux}
CLEARML__apiserver__pre_populate__enabled: "true"
CLEARML__apiserver__pre_populate__zip_files: "/opt/clearml/db-pre-populate"
CLEARML__apiserver__pre_populate__artifacts_path: "/mnt/fileserver"
ports:
- "8008:8008"
networks:
- backend
- frontend
elasticsearch:
networks:
- backend
container_name: clearml-elastic
environment:
ES_JAVA_OPTS: -Xms2g -Xmx2g
bootstrap.memory_lock: "true"
cluster.name: clearml
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
cluster.routing.allocation.disk.watermark.low: 500mb
cluster.routing.allocation.disk.watermark.high: 500mb
cluster.routing.allocation.disk.watermark.flood_stage: 500mb
discovery.zen.minimum_master_nodes: "1"
discovery.type: "single-node"
http.compression_level: "7"
node.ingest: "true"
node.name: clearml
reindex.remote.whitelist: '*.*'
xpack.monitoring.enabled: "false"
xpack.security.enabled: "false"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2
restart: unless-stopped
volumes:
- /opt/clearml/data/elastic_7:/usr/share/elasticsearch/data
- /usr/share/elasticsearch/logs
fileserver:
networks:
- backend
- frontend
command:
- fileserver
container_name: clearml-fileserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/data/fileserver:/mnt/fileserver
- /opt/clearml/config:/opt/clearml/config
ports:
- "8081:8081"
mongo:
networks:
- backend
container_name: clearml-mongo
image: mongo:3.6.5
restart: unless-stopped
command: --setParameter internalQueryExecMaxBlockingSortBytes=196100200
volumes:
- /opt/clearml/data/mongo/db:/data/db
- /opt/clearml/data/mongo/configdb:/data/configdb
redis:
networks:
- backend
container_name: clearml-redis
image: redis:5.0
restart: unless-stopped
volumes:
- /opt/clearml/data/redis:/data
webserver:
command:
- webserver
container_name: clearml-webserver
image: allegroai/clearml:latest
restart: unless-stopped
depends_on:
- apiserver
ports:
- "8080:80"
networks:
- backend
- frontend
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services:latest
restart: unless-stopped
privileged: true
environment:
CLEARML_HOST_IP: ${CLEARML_HOST_IP}
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
CLEARML_API_HOST: http://apiserver:8008
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-}
CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER}
CLEARML_AGENT_GIT_PASS: ${CLEARML_AGENT_GIT_PASS}
CLEARML_AGENT_UPDATE_VERSION: ${CLEARML_AGENT_UPDATE_VERSION:->=0.17.0}
CLEARML_AGENT_DEFAULT_BASE_DOCKER: "ubuntu:18.04"
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
CLEARML_WORKER_ID: "clearml-services"
CLEARML_AGENT_DOCKER_HOST_MOUNT: "/opt/clearml/agent:/root/.clearml"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/clearml/agent:/root/.clearml
depends_on:
- apiserver
networks:
backend:
driver: bridge
frontend:
driver: bridge
Well, it seems like you are using the latest docker-compose.yml
, but I think your docker images are from older versions (i.e. 0.17/0 and below).
The best thing to try is to pull the new docker images and start the server up again - try doing:
sudo docker-compose -f docker-compose.yml down
sudo docker-compose -f docker-compose.yml pull
sudo docker-compose -f docker-compose.yml up -d
It is working now. Thank you so much!
Well, it seems like you are using the latest
docker-compose.yml
, but I think your docker images are from older versions (i.e. 0.17/0 and below).The best thing to try is to pull the new docker images and start the server up again - try doing:
sudo docker-compose -f docker-compose.yml down sudo docker-compose -f docker-compose.yml pull sudo docker-compose -f docker-compose.yml up -d
Hi @jkhenning ,I have the same proplem and my docker-compose.yml
file is exactly the same as @amedyukhina 's ,but this solution is not working for me.
@LightManxx what errors are you getting?
@jkhenning I meet 8008 connection refuse. And this is the docker ps output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d3470f05547a allegroai/clearml:latest "/opt/clearml/wrap..." About an hour ago Up About an hour 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp clearml-webserver db3bc1c8fbe3 allegroai/clearml:latest "/opt/clearml/wrap..." About an hour ago Up 52 seconds 0.0.0.0:8008->8008/tcp, 8080-8081/tcp clearml-apiserver 4e3312726ebe docker.elastic.co/elasticsearch/elasticsearch:7.16.2 "/bin/tini -- /usr..." About an hour ago Restarting (1) 21 minutes ago clearml-elastic 5611d479324a redis:5.0 "docker-entrypoint..." About an hour ago Up About an hour 6379/tcp clearml-redis a08e39fe7972 mongo:4.4.9 "docker-entrypoint..." About an hour ago Up About an hour 27017/tcp clearml-mongo 657ba11ee759 allegroai/clearml:latest "/opt/clearml/wrap..." About an hour ago Up About an hour 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp clearml-fileserver
what should I do? Thank you very much! The problem is memory is not enough.I suggest that the clearml deployment page points the require RAM and so on.
@zylprivate Which deployment page were you following?
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac/ This page doesn't require the device's RAM limitation which I think it should be added.
@zylprivate the docker status indicates the elastic service is restarting, there's obviously something wrong - can you do sudo docker logs clearml-elastic
and share the result?
I have known the reason for this problem before. It is because the memory is not enough(2GB RAM). So I suggest that add device requirements to the deployment page. Thanks for your reply.
Thanks @zylprivate, will do!
sometimes clearing web browser's cache may help
Hi I followed the instructions on the install page - installed using docker on Ubuntu. All steps worked and no errors reported. I installed it today, so presumably is a recent version, but im not sure how to tell specifically for the server.
If I access the profile page I see:
Unlike the clearml hosted webui pages, theres no navigation options either, any way i access the web ui (for example no user icon top right, no nav bar down the lefthand side).
Any ideas?