Bug: Hyperparameter optimisation does not clone docker settings from template task

allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

https://clear.ml/docs

Apache License 2.0

5.53k stars 642 forks source link

Bug: Hyperparameter optimisation does not clone docker settings from template task #729

Open MoPl90 opened 2 years ago

MoPl90 commented 2 years ago

I followed the example for Hyperparameter optimisation in the documentation. While some import statements are missing, I could manage to set up a running script and optimise a template task. Pretty cool! :) However, it seems that the default execution of that base task is not fully copied, especially the docker settings.

Expected behaviour: Copy docker settings from template task to optimisation task

Actual behaviour: I see ubuntu:18.04 image and no docker arguments in the execution page.

ainoam commented 2 years ago

Thanks for reporting @MoPl90, We'll take a look at what seems to be going wrong.

erezalg commented 2 years ago

Hi @MoPl90,

First, as for the docs, they are not supposed to have full code, it links to the full code (that should work out of the box here) hope this is clear!

As for the issue, I added this to my base task: task.set_base_docker(docker_image="ubuntu:18.04",docker_arguments='-e ENV=1',docker_setup_bash_script=['apt update'])

This set the docker, args and bash script (just dummy data).

Then on the cloned experiments, not the HPO controller, IE the experiments that the HPO processs spawns (and not the task called Automatic Hyper-Parameter Optimization) you do see this information, see below:

So I think it should work.

If this doesn't work for you, can you please let me know what SDK and server version you're using?

MoPl90 commented 2 years ago

Hi, thanks for your reply. If I set the image and options explicitly via Task.set_base_docker it works.

However, if I don't specify this explicitly, the agent neither uses the template task's docker settings, nor the clearml.conf default settings.

erezalg commented 2 years ago

@MoPl90 What versions of clearml python package and clearml server are you using? Also, how are you specifying the docker image in the template task? BTW, is the clearml agent running in docker mode? Maybe a silly question but worth asking :smile:

MoPl90 commented 2 years ago

We are running version 1.6.2 of the python package, and the agents are version 1.3.0 running in docker mode. The server is on versions: 1.1.1-135 • 1.1.1 • 2.14.

I have a docker image specified in the clearml.conf file (which is ignored), and I manually added the image to the template task in the UI (which is also ignored). The correct image is only used if I use Task.set_base_docker in the HPO script.

erezalg commented 2 years ago

@MoPl90, In the conf file, are you using the agent.default_docker ? If so, it won't register unless you're running the experiment in using clearml agent. This is why it's not registered. As for adding the docker to template task from the UI, how are you doing it? Once a task is "completed" you can't add a docker image to the task.

MoPl90 commented 2 years ago

In the conf file, are you using the agent.default_docker?

Yes, exactly. And it seems that this field is ignored by the agents, since the "image" field in the Execution panel of the UI is empty unless I specify the container via Task.set_base_docker.

it won't register unless you're running the experiment in using clearml agent

Not sure I understand. If I have a base experiment using a custom container (say I specified it via Task.set_base_docker, I would expect the HPO experiments to copy those settings (as it is the case for all other execution arguments). At the moment I have to specify the container manually by calling Task.set_base_docker in the HPO experiment again.