allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

Yolov8cls Hyperparameter Optimization #1113

Open pizzatakeaway opened 1 year ago

pizzatakeaway commented 1 year ago

Describe the bug

Hi! First of all, thank you for you work :) I am currently trying to perform a Task.TaskTypes.optimizer for a classification task using Yolov8cls. The issue I'm facing, is that only the experiments that failed are listed and plotted. Furthermore (and is very likely to be related), the optimization task keeps running forever, instead of been marked as completed after reaching total_max_jobs.

image

PS: The "same" setup worked before for yolov5 (with max metrics map_0.5 for image recognition)

To reproduce

from clearml import Task, TaskTypes
from clearml.automation.hpbandster import OptimizerBOHB

def job_complete_callback(job_id, objective_value, objective_iteration, job_parameters, top_performance_job_id):
    print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
    if job_id == top_performance_job_id:
        print('A better model was trained! Objective reached {}'.format(objective_value))

task = Task.init(
    project_name='Project',
    task_name='Hyperparameter Optimization',
    task_type=TaskTypes.optimizer,
    reuse_last_task_id=False
)

baseTaskYolov8clsTraining = Task.get_task(project_name="Project", task_name="Train Yolov8cls")
baseTaskYolov8clsTraining.set_parameters({...})

searchStrategy = OptimizerBOHB
hyperParameter = [...]

optimizer = HyperParameterOptimizer(
    base_task_id=baseTaskTimmTraining.id,
    hyper_parameters=hyperParameter,
    objective_metric_title='metrics',
    objective_metric_series='top1_acc',
    objective_metric_sign='max',
    execution_queue='task_gpu',
    optimizer_class=searchStrategy,
    max_number_of_concurrent_tasks=1,
    save_top_k_tasks_only=5,
    compute_time_limit=None,
    total_max_jobs=20,
    min_iteration_per_job=1000,
    max_iteration_per_job=2000,
)

task.execute_remotely(exit_process=True)

optimizer.set_report_period(10) # report every 10 minutes

optimizer.start(job_complete_callback=job_complete_callback)
optimizer.wait()
optimizer.stop()

Expected behaviour 1

All experiment results are listed an plotted in the hyperparmater optimization task > plot, and not only the experiments that failed.

Expected behaviour 2

The hyperparameter optimization task is marked as completed after running total_max_jobs=20 experiments.

Environment

pizzatakeaway commented 1 year ago

@allegroai-git please-don't-forget-me ping ;)

AlexandruBurlacu commented 1 year ago

Hey @pizzatakeaway thanks for the ping :smile:

When running the HPO task you provided, are there any spawned tasks/trials that complete successfully? Also, what's the reason the other tasks/trials fail?

The HPO running even after all experiments are done is a known issue, we'll fix it in the upcoming releases.

pizzatakeaway commented 12 months ago

Hi @AlexandruBurlacu! Really appreciate your efforts and I am looking forward your next release!

To your questions: