allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.68k stars 654 forks source link

HPO ParameterSet bug #1203

Open BalogneseCoder opened 9 months ago

BalogneseCoder commented 9 months ago

Describe the bug

In HPO(hyper-parameters optimization) I use both hyperparameters and ParameterSets: so the list provided as hyper_parameters is something like that: [<clearml.automation.parameters.DiscreteParameterRange at 0x7f4520fc7314>, <clearml.automation.parameters.DiscreteParameterRange at 0x7f4520fc7190>, <clearml.automation.parameters.ParameterSet at 0x7f4520fc5180>, <clearml.automation.parameters.ParameterSet at 0x7f4520fc66b0>] all of that are Parameters objects as described in docs. ParameterSets has the following structure(i use to_dict() method to show): {'type': 'ParameterSet', 'name': None, 'values': [{'param1': <clearml.automation.parameters.DiscreteParameterRange at 0x7f4520fc72b0>, 'param2': <clearml.automation.parameters.DiscreteParameterRange at 0x7f4520fc59f0>}, {'param1': <clearml.automation.parameters.DiscreteParameterRange at 0x7f4520fc5330>, 'param2': <clearml.automation.parameters.DiscreteParameterRange at 0x7f4520fc5e70>}]}

in console I see: 2024-02-12 20:17:51,239 - clearml - WARNING - Could not determine if Hydra is overriding a Config Group param1=<clearml.automation.parameters.DiscreteParameterRange object at 0x7f86043163b0> so instead of value the task is provided with object itself, for sure it fails after that.

To reproduce

Exact steps to reproduce the bug. Provide example code if possible. run HPO with list of any ParameterRange + ParameterSet with several combinations and ParameterRange in each combination.

btw to_list() shows values instead of ParameterRange objects [{'param1': '1', 'param2': '1'}, {'param1': '0', 'param2': '0'}]

Expected behaviour

What is the expected behaviour? What should've happened but didn't? optimization task should send the exact values, not an object when starting training task.

I suppose to_dict method is inherited from Parameter and ParameterSet structure is more complex so it works incorrect, but I am not sure about this.

Environment

ainoam commented 9 months ago

@BalogneseCoder Appreciate if you can provide a code example demonstrating how you your parameters are defined and passed to HPO.

BalogneseCoder commented 9 months ago

param1 = DiscreteParameterRange(name='param1', values=[0, 1]) param2 = DiscreteParameterRange(name='param2', values=[0, 1]) param3 = DiscreteParameterRange(name='param3', values=[0, 1]) paramset1 = ParameterSet([{'param4': DiscreteParameterRange(name='param4', values=[0]), 'param5': DiscreteParameterRange(name='param5', values=[0])}, {'param4': DiscreteParameterRange(name='param4', values=[1]), 'param5': DiscreteParameterRange(name='param5', values=[1])}]) paramset2 = ParameterSet([{'param6': DiscreteParameterRange(name='param6', values=[10]), 'param7': DiscreteParameterRange(name='param7', values=[10])}, {'param6': DiscreteParameterRange(name='param6', values=[20]), 'param7': DiscreteParameterRange(name='param7', values=[20])}])

hparams = [param1, param2, param3, paramset1, paramset2]

BalogneseCoder commented 9 months ago

optimizer = HyperParameterOptimizer(
base_task_id='696d6e7a4d154d0286ba5df27f9de52d',
hyper_parameters=hparams,
objective_metric_title='test_title',
objective_metric_series='test_series',
objective_metric_sign='max',

setting optimizer

optimizer_class=GridSearch,                                                             
# project for child tasks                                                               
spawn_project='paramset_test_project/children',                                                                                                                                
# configuring optimization parameters                                                   
execution_queue='someQueue',                                                   
save_top_k_tasks_only=2,                             
max_number_of_concurrent_tasks=2,           
optimization_time_limit=1440,                         
total_max_jobs=2,                                           

)

eugen-ajechiloae-clearml commented 8 months ago

Hi @BalogneseCoder !

2024-02-12 20:17:51,239 - clearml - WARNING - Could not determine if Hydra is overriding a Config Group param1=<clearml.automation.parameters.DiscreteParameterRange object at 0x7f86043163b0> so instead of value the task is provided with object itself, for sure it fails after that.

This looks like a Hydra error. We issue this warning when something went wrong when checking if an override corresponds to a config group. How could this happen? Do you have some simple runnable code that contains both the HPO code and the task you are trying to optimize, to help us reproduce this?