allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.71k stars 657 forks source link

ClearMl Agent with --docker : Unable to find image 'my_docker:latest' locally #1340

Open NnaYelsel opened 1 month ago

NnaYelsel commented 1 month ago

Hello everyone,

I'm trying to execute a task from an agent running in a server with the docker mode (more or less this : https://www.youtube.com/watch?v=MX3BrXnaULs)

I have clear-ml instance working on one server.

I installed docker and clearml-agent on another server (linux). I initialized the clearml agent. Then, I run: clearml-agent daemon --queue default --docker my_docker --detached

I can see this agent in the worker and queue of the web API list but when I'm trying to execute it a basic task by putting in the queue. The task is failing, with the following logs in the console:

image

It's like the docker does not exist.

Moreover, when I'm doing docker ps -a the docker created by clearml does not appear, but is appearing with clearml-agent list command (is it normal ?)

I tried also to use clearml-agent build method but it doesn't work either.

Did I missunderstand something about the usage of the clearml-agent with docker ?

Best regards.

NnaYelsel commented 1 month ago

Can someone tell me if there clearml-agent daemon constrcuted with --docker mode appears when doing: docker ps -a ?

jkhenning commented 1 month ago

Hi @NnaYelsel,

The --docker command line switch basically tells the agent to run tasks inside a docker container (and the value provided to the --docker option says which docker image to specify to the docker run command) - once the agent issues this command, it's up to the docker service to locate the image, download it if required, and start the container as instructed by the agent.

Any docker image you can use on your machine (i.e. any image you can use to run docker run on your machine) can be used by the agent.

NnaYelsel commented 1 month ago

Thank you for the answer. So I need to first build an image docker for my task and secondly call it with clearml-agent daemon ?

Can I build it with: clearml-agent build --id <TASK_ID> --docker --target new_docker

Thank you.

jkhenning commented 1 month ago

The docker image you need does not need to be built with the specific task - the build command of the agent allows you to prebuild a complete image which can run the task independently, but when running it using a daemon command you simply need some docker image on which the task can be executed (i.e. one that contains python and perhaps other system packages you require)