Open GPrev-Lab4i opened 1 year ago
Hi @GPrev-Lab4i,
Thanks for the detailed report. Your conclusions seem on point - basically, the intended use is to define the server's URLs as external ones (i.e. not the internal docker network) at least for the web service and fileserver since these are used when registering data and we expect them to be externally accessible names (so that when the information is presented in a remote machine, it will try to access the correct addresses).
You can, however, set up the newly spun docker container to use the backend network quite easily, by configuring the services agent CLEARML_AGENT_EXTRA_DOCKER_ARGS
environment variable with the required docker options, by passing for example CLEARML_AGENT_EXTRA_DOCKER_ARGS=--network=backend
Hi @jkhenning and thank you for your answer. I did not know about CLEARML_AGENT_EXTRA_DOCKER_ARGS, I will keep it in mind as it could be useful in other situations. If I understand correctly, the solution I found seems to be in line with the intended use. Do you think it would make sense to edit "docker/docker-compose.yml" with those changes ?
That's a good question, basically setting it hard-coded to the backend network will make it so that users will not be aware of this and will keep it this way, and might have data reported using this internal URL (which will prevent them from accessing it externally). Still, something is better than nothing? 🙂
Maybe I didn't express it clearly, but I was thinking of the opposite : setting it hard-coded to use the external IP : CLEARML_API_HOST: http://${CLEARML_HOST_IP}:8008
. That way, if I understand correctly, it should work for every usecase.
Worker doesn't execute task in docker mode
Environment
ClearML server version : 1.9.2 OS : Debian 10
Steps to reproduce :
Observed behaviour :
clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server http://apiserver:8008/ ?
Expected behaviour :
Possible cause of the problem :
I think the agent-services container can see the apiserver container because they are on the same virtual network (backend). The created worker container is not connected to this network, and so it can't access the apiserver and execute the task.
Ideas on how to solve the problem
Solution A :
I was able to solve this problem by changing the value of "CLEARML_API_HOST" in the file docker-compose.yml, under "agent-services" -> "environment". By default, it is set to "http://apiserver:8008/", and I changed it to "http://${CLEARML_HOST_IP}:8008". That way, the worker container connects to the apiserver through the host, not relying on a virtual network.
Possible solution B :
Another idea would be to find a way to give the worker container access to the virtual network "backend", so that it could use it to connect to apiserver.
Possible solution C :
Another idea would be to configure the agent not to use another docker container, but create workers within its own container. This might have performance implications though.
Example
Example task used for testing :
Log as seen from the ClearML web interface (no further relevant logs were found from going inside the agent and worker containers) :
Using cached zipp-3.6.0-py3-none-any.whl (5.3 kB)
Collecting typing-extensions>=3.6.4; python_version < "3.8"
Using cached typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Installing collected packages: attrs, six, orderedmultidict, furl, certifi, urllib3, charset-normalizer, requests, pyparsing, psutil, pyjwt, PyYAML, distlib, zipp, importlib-resources, typing-extensions, importlib-metadata, filelock, platformdirs, virtualenv, pyrsistent, jsonschema, pathlib2, python-dateutil, clearml-agent
Attempting uninstall: six
Found existing installation: six 1.11.0
Uninstalling six-1.11.0:
Successfully uninstalled six-1.11.0
Successfully installed PyYAML-6.0 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-2.0.12 clearml-agent-1.5.1 distlib-0.3.6 filelock-3.4.1 furl-2.1.3 importlib-metadata-4.8.3 importlib-resources-5.4.0 jsonschema-3.2.0 orderedmultidict-1.0.1 pathlib2-2.3.7.post1 platformdirs-2.4.0 psutil-5.9.4 pyjwt-2.4.0 pyparsing-3.0.9 pyrsistent-0.18.0 python-dateutil-2.8.2 requests-2.27.1 six-1.16.0 typing-extensions-4.1.1 urllib3-1.26.15 virtualenv-20.17.1 zipp-3.6.0
WARNING: You are using pip version 20.1.1; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.