allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

Add support for muitl-variable replacement for PipelineController.add_step() parameter_override's. #1089

Open natephysics opened 1 year ago

natephysics commented 1 year ago

Proposal Summary

Tested with pipelines from tasks.

When adding a step to a pipeline it's possible to override parameters. When overriding parameters, it's possible to do variable replacement of the parameters for parameters from the prior stages of the pipeline, e.g. parameter_override={'Args/input_file': '${stage3.id}' }. However, this feature doesn't appear to currently support overriding multiple parameters in the same argument, e.g. parameter_override={'Args/input_file': '[${stage3_task_A.id}, '${stage3_task_B.id}', '${stage3_task_C.id}']}. There are a lot of potential use cases for such a feature.

It is possible to do something like this using a dedicated argument for each variable being replaced. But that method would then create a huge number of potential variables that would need to be discovered and iterated over when it would be much simpler to have a single argument with a list of values.

This should be a fairly non-intrusive change to the codebase. I'd happily create a PR for it if someone would be willing to direct me to the relevant parts of the code (I've looked with the help of someone on Slack but it wasn't obvious).

Motivation

I have a pipeline that has a variable number of parent stages that will feed into one child. It would be much easier to return a list and iterate over the list rather than discover each variable.

Related Discussion

https://clearml.slack.com/archives/CTK20V944/p1689609314325429

AlexandruBurlacu commented 1 year ago

Hey @natephysics, we always appreciate any contributions, and I'll be glad to help you! That being said, I believe to support your suggestion, you'd need to check the _parse_step_ref method at clearml/automation/controller.py:2779 and extend it's functionality to handle multiple references per parameter

natephysics commented 1 year ago

Sorry, it took me a while to put it together. I was on vacation.

https://github.com/allegroai/clearml/pull/1099

pollfly commented 1 year ago

Hey @natephysics! v1.13.0 is now out, supporting recursive list, dict, and tuple ref parsing for PipelineController.add step() parameter overrides

natephysics commented 1 year ago

Hey @natephysics! v1.13.0 is now out, supporting recursive list, dict, and tuple ref parsing for PipelineController.add step() parameter overrides

Awesome! I'm glad it was included.