ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
I'm working on a PR that adds new functionality to the task scheduler by adding a new parameter task_id_function to TaskScheduler.add_task() that takes a callable which has an expected return of a task_id. This task_id_function function is run at runtime (when the task scheduler would normally execute the scheduled task) and uses the task_id returned by the function + the other parameters from .add_task() as the scheduled task.
Motivation
Why is this useful: there's a host of reasons but the biggest one: it gives users much more control over the tasks that are run by the task scheduler. Currently, as far as I can tell, if I wanted to run the most recent task (at runtime) from a given project with a specific tag, it's not possible to do with the task scheduler. I can use the schedule_function parameter and create a function that finds and runs the task but then I lose the core advantages of .add_task(), no way to specify queues, task_parameters, and task_overrides. Naturally, I could wrap all of that into the function called by task_parameters but then I'm basically just writing my own scheduler at that point. This will also let you do some preprocessing before returning the task_id, for example, if you wanted to clean up old tasks.
Proposal Summary
I'm working on a PR that adds new functionality to the task scheduler by adding a new parameter
task_id_function
toTaskScheduler.add_task()
that takes a callable which has an expected return of atask_id
. Thistask_id_function
function is run at runtime (when the task scheduler would normally execute the scheduled task) and uses thetask_id
returned by the function + the other parameters from.add_task()
as the scheduled task.Motivation
Why is this useful: there's a host of reasons but the biggest one: it gives users much more control over the tasks that are run by the task scheduler. Currently, as far as I can tell, if I wanted to run the most recent task (at runtime) from a given project with a specific tag, it's not possible to do with the task scheduler. I can use the
schedule_function
parameter and create a function that finds and runs the task but then I lose the core advantages of.add_task()
, no way to specify queues, task_parameters, and task_overrides. Naturally, I could wrap all of that into the function called bytask_parameters
but then I'm basically just writing my own scheduler at that point. This will also let you do some preprocessing before returning the task_id, for example, if you wanted to clean up old tasks.Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1708447659999379?thread_ts=1708445057.172119&cid=CTK20V944