Yolov8cls Hyperparameter Optimization

pizzatakeaway commented 1 year ago

Describe the bug

Hi! First of all, thank you for you work :) I am currently trying to perform a Task.TaskTypes.optimizer for a classification task using Yolov8cls. The issue I'm facing, is that only the experiments that failed are listed and plotted. Furthermore (and is very likely to be related), the optimization task keeps running forever, instead of been marked as completed after reaching total_max_jobs.

PS: The "same" setup worked before for yolov5 (with max metrics map_0.5 for image recognition)

To reproduce

from clearml import Task, TaskTypes
from clearml.automation.hpbandster import OptimizerBOHB

def job_complete_callback(job_id, objective_value, objective_iteration, job_parameters, top_performance_job_id):
    print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
    if job_id == top_performance_job_id:
        print('A better model was trained! Objective reached {}'.format(objective_value))

task = Task.init(
    project_name='Project',
    task_name='Hyperparameter Optimization',
    task_type=TaskTypes.optimizer,
    reuse_last_task_id=False
)

baseTaskYolov8clsTraining = Task.get_task(project_name="Project", task_name="Train Yolov8cls")
baseTaskYolov8clsTraining.set_parameters({...})

searchStrategy = OptimizerBOHB
hyperParameter = [...]

optimizer = HyperParameterOptimizer(
    base_task_id=baseTaskTimmTraining.id,
    hyper_parameters=hyperParameter,
    objective_metric_title='metrics',
    objective_metric_series='top1_acc',
    objective_metric_sign='max',
    execution_queue='task_gpu',
    optimizer_class=searchStrategy,
    max_number_of_concurrent_tasks=1,
    save_top_k_tasks_only=5,
    compute_time_limit=None,
    total_max_jobs=20,
    min_iteration_per_job=1000,
    max_iteration_per_job=2000,
)

task.execute_remotely(exit_process=True)

optimizer.set_report_period(10) # report every 10 minutes

optimizer.start(job_complete_callback=job_complete_callback)
optimizer.wait()
optimizer.stop()

Expected behaviour 1

All experiment results are listed an plotted in the hyperparmater optimization task > plot, and not only the experiments that failed.

Expected behaviour 2

The hyperparameter optimization task is marked as completed after running total_max_jobs=20 experiments.

Environment

Server type: self hosted
WebApp: 1.9.1-312
Server: 1.9.1-312
API: 2.23
Python Version 3.10.12
Agent: OS Linux-5.15.0-78-generic-x86_64-with-glibc2.35 , cuda 12

pizzatakeaway commented 1 year ago

@allegroai-git please-don't-forget-me ping ;)

AlexandruBurlacu commented 1 year ago

Hey @pizzatakeaway thanks for the ping :smile:

When running the HPO task you provided, are there any spawned tasks/trials that complete successfully? Also, what's the reason the other tasks/trials fail?

The HPO running even after all experiments are done is a known issue, we'll fix it in the upcoming releases.

pizzatakeaway commented 12 months ago

Hi @AlexandruBurlacu! Really appreciate your efforts and I am looking forward your next release!

To your questions:

Yes, there are tasks that complete successfully
the tasks that failed, failed due to an error from our side: a TypeError following an if-else-clause

allegroai / clearml