Open hadyan-tvlk opened 3 years ago
Hi @hadyan-tvlk,
I assume you're using the docker-compose
deployment?
This event cause the ClearML Server is unreachable
What exactly do you mean by the ClearML Server being unreachable? Can you access the WebApp at port 8080
? If not, what error are you seeing?
It is unlikely that this is caused by the agent services as the agent services is simply a client of the server, and cannot affect the server itself or the WebApp.
Hi @jkhenning,
Yes correct
I assume you're using the docker-compose deployment?
Yes, i just can't open the Web UI and perform tracking. The solution is to restart the server and it came back to normal
What exactly do you mean by the ClearML Server being unreachable? Can you access the WebApp at port 8080? If not, what error are you seeing?
The next time it happens, can you do sudo docker ps
on the server machine and share the output?
Also, it would be nice to see the output of you browser's Developer Tools' Network section when trying to access the Web UI (when you fail to open it).
I am having the same issue, with the agent-services container continually restarting.
I have installed clearml server on an Azure VM, running on Ubuntu 18.04. It's a completely fresh machine and I can confirm that I have opened ports 8080, 8081 and 8008 on the VM.
The only modification I have made from following the basic installation guide, is to secure the web server by creating the apiserver.conf file in /opt/clearml/config and adding the following to secure the web interface:
auth {
# Fixed users login credentials
# No other user will be able to login
fixed_users {
enabled: true
pass_hashed: false
users: [
{
username: "***********"
password: "***********"
name: "Ed Morris"
},
{
username: "**************"
password: "**************"
name: "Chris Musselle"
},
]
}
}
Obviously, the actual username and passwords have been replaced.
The installation was performed using the docker-compose method, which followed from the documentation. I can access the web portal fine without issue.
Performing a docker ps shows that the clearml-agent-services is always restarting.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e4d95f57b20f allegroai/clearml:latest "/opt/clearml/wrappe…" 7 minutes ago Up 7 minutes 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
30313b878a66 allegroai/clearml-agent-services:latest "/usr/agent/entrypoi…" 7 minutes ago Restarting (1) 46 seconds ago clearml-agent-services
7e247b87d335 allegroai/clearml:latest "/opt/clearml/wrappe…" 7 minutes ago Up 7 minutes 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp clearml-apiserver
912c386c705c docker.elastic.co/elasticsearch/elasticsearch:7.6.2 "/usr/local/bin/dock…" 7 minutes ago Up 7 minutes 9200/tcp, 9300/tcp clearml-elastic
6ccf7f03c607 redis:5.0 "docker-entrypoint.s…" 7 minutes ago Up 7 minutes 6379/tcp clearml-redis
d1a98ae6cd21 mongo:3.6.5 "docker-entrypoint.s…" 7 minutes ago Up 7 minutes 27017/tcp clearml-mongo
42bfc545f7e0 allegroai/clearml:latest "/opt/clearml/wrappe…" 7 minutes ago Up 7 minutes 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
Looking at the logs for the this container, it is complaining about credentials not being correct:
(base) edmorris@ecm-clearml-server-001:/opt/clearml/config$ docker logs 30313b878a66
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server http://apiserver:8008 ?
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
http://13.81.201.17:8081 http://13.81.201.17:8080 http://apiserver:8008
WARNING: You are using pip version 20.3.3; however, version 21.1.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
clearml_agent: ERROR: Failed getting token (error 401 from http://apiserver:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)
Hi @ecm200,
Thanks for providing such detailed info 👍
What you're seeing is a result of a change we've made in ClearML Server 1.0.0 and up, which is still lacking in the documentation (we plan to release a revamped version of the documentation in the next few days which will address this as well).
In short, starting from 1.0.0, when running in the fixed-users mode, the ClearML Agent Services that runs as part of the server requires specific credentials to be provided. This is due to the fact that for security reasons, when running in the fixed-users mode the server will not support the hard-coded test credentials used by the agent by default (see here for a related discussion in our Slack channel).
To provide the required credentials to the agent-services, you will need to set the CLEARML_API_ACCESS_KEY
and CLEARML_API_SECRET_KEY
environment variables with appropriate credentials when starting the server using docker-compose
- these can be a set of key/secret credentials generated in your ClearML Server's profile page, or simply the username/password of a fixed user you defined. Obviously, you can generate these credentials right now since your server is up and running, even though the agent-services is not booting up.
Setting these is done in the same way as described in step #11 in Deploying ClearML Server: Linux and macOS / Deploying.
Thanks for the quick feedback @jkhenning.
So just to be clear then, if I go to the profile page on the ClearML WebUI, and generate a App Credential, just like I did to connect my local laptop ClearML installation to the server on the Azure VM, then I supply those generated keys by exporting them in the relevant environment variables?
Exactly right 🙂
@jkhenning Thanks so much.
This has resulted in a stable system.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9ef0d8cfc721 allegroai/clearml-agent-services:latest "/usr/agent/entrypoi…" About an hour ago Up About an hour clearml-agent-services
11a5d2041fb1 allegroai/clearml:latest "/opt/clearml/wrappe…" About an hour ago Up About an hour 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f8fb56da4c77 allegroai/clearml:latest "/opt/clearml/wrappe…" About an hour ago Up About an hour 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp clearml-apiserver
fa3285bddd1a allegroai/clearml:latest "/opt/clearml/wrappe…" About an hour ago Up About an hour 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
f47a9774497f redis:5.0 "docker-entrypoint.s…" About an hour ago Up About an hour 6379/tcp clearml-redis
5aed2c482329 docker.elastic.co/elasticsearch/elasticsearch:7.6.2 "/usr/local/bin/dock…" About an hour ago Up About an hour 9200/tcp, 9300/tcp clearml-elastic
7ff1047655f6 mongo:3.6.5 "docker-entrypoint.s…" About an hour ago Up About an hour 27017/tcp clearml-mongo
@jkhenning
In relation to the issue of requiring the agent-services service needing secret keys that need to be set in environment variables.
What is the safest way of doing this on a routine basis?
I mean, whilst testing and learning the deployment, the VM hosting the server will not be up 24 hours, so what is the easiest way to set this automatically without the need to set environment variables and restart the server?
Also, enhancement suggestion, it would be really great if on the profile screen where it shows current access keys, it would be really useful if you could add a column so that people can add their own description to the secret key to know what service or machine is using it. When you local machines, compute nodes, and services all requiring secret keys, it will quickly become impossible to track which key is for what purpose, unless separate records are kept. Adding a function to create a recognizable tag by the user would really help in my opinion.
Dear ClearML community,
i have issue where the ClearML Server at some point (not sure when) always trigger restart for
clearml-agent-services
and never up and running. However, the rest of the services still up and running.This event cause the ClearML Server is unreachable. Any idea? is this related to the infra config? Thanks in advance!