allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.55k stars 643 forks source link

HyperParameterOptimizer fails because bask task hyper parameters types are not copied over #975

Open reiffd7 opened 1 year ago

reiffd7 commented 1 year ago

Describe the bug

When I run HyperParameterOptimizer from clearml.automation, each child "experiment" task fails because the hyper parameter types are not copied over correctly. Each type for the hyper parameters is converted to strings.

base task:

GENERAL

batch_size | 100 //type Int -- | -- optimization task:

GENERAL

batch_size | 100 //type String -- | --
## To reproduce Just followed the tutorial @ https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py but replaced the base task id with one of my tasks. ## Expected behaviour I would expect the correct hyper parameter types to copy over from the base task to the hyper parameter search child task. I know that task.get_parameters() has a cast parameter that can be used to retrieve the correct types, but this does not seem to be an option when conducting a hyper parameter search. I was able to fix the issue locally by editing clearml/automation/job.py. In line 410: ` hyper_params = task.get_parameters(cast=True) if params_override is None else params_override ` In line 541: ` task_params = base_temp_task.get_parameters(cast=True, backwards_compatibility=False) ` But, I would like to be able to pass cast=True to the HyperParameterOptimizer method. ## Environment * Server type (self hosted \ app.clear.ml): app.clear.ml * ClearML SDK Version: 1.10.3 * Python Version: 3.7.16 * OS (Windows \ Linux \ Macos): Linux
phineasng commented 1 year ago

Any update on this, or did you manage to find a less invasive workaround by any chance @reiffd7 ?

AlexandruBurlacu commented 1 year ago

Hello @reiffd7 and @phineasng, sorry for taking such long time. I believe we solved this issue (wrong casting of configuration values) in more recent ClearML versions. Try to install the latest version, which is 1.11.0, and please let us know whether it solves the bug

wxdrizzle commented 4 months ago

Hi @AlexandruBurlacu , seems that in recent version this problem still exists. I created a new issue #1238 for this several days ago and today just found this issue is exactly what I have.

eugen-ajechiloae-clearml commented 4 months ago

Hi @wxdrizzle @reiffd7 ! We have prepared a fix for this issue and we will soon release it.

wxdrizzle commented 4 months ago

Hi @wxdrizzle @reiffd7 ! We have prepared a fix for this issue and we will soon release it.

Happy to know this. Thank you! I'll update once I can try the new release.

eugen-ajechiloae-clearml commented 3 months ago

Hi @reiffd7 @phineasng @wxdrizzle ! We have released an RC (and we will soon release an official version) that addresses this issue. Can you please try it out? To install it, do pip install clearml==1.16.0rc0

wxdrizzle commented 3 months ago

Hi @reiffd7 @phineasng @wxdrizzle ! We have released an RC (and we will soon release an official version) that addresses this issue. Can you please try it out? To install it, do pip install clearml==1.16.0rc0

Hi @eugen-ajechiloae-clearml , I tried this and confirm that in my case it solved the problem perfectly. I'll close my issue. Thank you so much!!!

pollfly commented 3 months ago

Hey @reiffd7! Just letting you know that this issue has been resolved in the recently released v1.16.0. Let us know if there are any issues :)