Open Make42 opened 2 years ago
This heavily depends on what features from Kedro you want to use and which features from ClearML, but in general ClearML tools are pretty modular.
E.g. You could keep your feature store as a clearml-data versioned dataset and then use the clearml SDK to access it from within a Kedro node/pipeline.
or
You could track the version history of each Kedro node by simply tracking the node's code using the clearml experiment manager.
Heck, you could probably even run Kedro nodes as clearml tasks that are then remotely executed by clearml agents, simply by adding
from clearml import Task
task.init(project_name="my_project", task_name="my_task")
task.execute_remotely(queue="default")
to the python function that will be turned into a kedro node. When running the kedro pipeline, it will run the underlying python function, which in turn will register itself as a clearml task to be added to a clearml queue and executed by a clearml agent.
Point being, both tools are open source pip packages. Especially clearml does not force you to change any code, or structure your code in any particular way, so you should easily be able to add clearml feature where you want them, by just using the pip package.
That said, I have not tested any of these things! Simply based on my knowledge of clearml they should be possible, but I'm not very familiar with the inner workings of a Kedro pipeline, so take with a grain of salt :)
It depends on what kind of features you are looking into.
Remote execution of a kedro pipeline shouldn't be a problem. Experiment Tracking should also work pretty well, you can register these Task
with before_pipeline_xxxx
hooks, etc. Kedro is a CLI first library, which most users start with kedro run
to run their pipeline, but you can easily run the pipeline equivalent with a Python API by creating a session. It is also well supported since this is the recommended way to run a pipeline in a notebook environment.
I would start with the CLI and use hooks for necessary clearml
features that you needed and only use session
unless it's necessary.
Data versioning may be the trickier part, I am not too familiar with how clearml
is doing this, since kedro
also comes with its own data versioning feature. It may make sense to not enable this in Kedro and simply delegate it to clearml
.
We are currently product-hunting for our MLOps infrastructure and ClearML, Kedro, MLRun are on our short list. We are considering to combine ClearML with Kedro. They are similar in purpose but have different features if one looks at the details. E.g. Kedro has hooks for tasks that implement cross-cutting concerns. At reddit
has been written. So, I would like to know how this would be done on both levels: