ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
This can create long-running tasks, that by default is being executed on my local machine sequentially, but I would like to benefit from parrallellization of our agent/worker setup on ClearML. Therefore, in my script I have added:
task.execute_remotely("default")
My problem is now that with execute_remotely and exit_process=True (default), the multi-run is being killed entirely at the first instance.
One workaround could be to execute_remotely("default", clone=True, exit_process=False) and then manually terminate execution. To me, this seems like a bad fix to what should be supported behaviour.
Ideally, exit_process would not use sys.exit, which kills entirely, but something that simply terminates the single hydra task. I have initiated a discussion on the Hydra Github on what signal that could be.
Describe the bug
I would like to combine hydra multi-runs with ClearML remote execution. I.e. configuring a multi-run task with hydra:
This can create long-running tasks, that by default is being executed on my local machine sequentially, but I would like to benefit from parrallellization of our agent/worker setup on ClearML. Therefore, in my script I have added:
My problem is now that with
execute_remotely
andexit_process=True
(default), the multi-run is being killed entirely at the first instance.One workaround could be to
execute_remotely("default", clone=True, exit_process=False)
and then manually terminate execution. To me, this seems like a bad fix to what should be supported behaviour.Ideally, exit_process would not use sys.exit, which kills entirely, but something that simply terminates the single hydra task. I have initiated a discussion on the Hydra Github on what signal that could be.
Environment
Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1675415221867419