Galileo-Galilei / kedro-mlflow

A kedro-plugin for integration of mlflow capabilities inside kedro projects (especially machine learning model versioning and packaging)
https://kedro-mlflow.readthedocs.io/
Apache License 2.0
197 stars 31 forks source link

KedroPipelineModel requires unnecessary pipeline input dependencies to be executed #273

Closed Debbby57 closed 2 years ago

Debbby57 commented 2 years ago

Hi @Galileo-Galilei

Description

the KedroPipelineModel has a initial_catalog property which causes some problems. This initial_catalog can contain some Kedro Datasets but it's not necessary to log them when you train your model. because of this property I can't load my model anymore. I have to train it again.

I explain : when I trained my model I used a kedro home-made plugin to load a specific dataset (which has no impact for my model). After that, I updated this plugin independently of my ML project. Today, I want to load my model but I can't because the load function uses the old Kedro Catalog with my old plugin version which is not in my environnement anymore.

Context

It would be great if we can update the kedro-catalog (only dataset and not the artifacts for the model of course !) without having to retrain our models.

Possible Implementation

Log in Mlflow what is only necessary.

I hope my issue is clear.

thank you

Galileo-Galilei commented 2 years ago

Hi, I can reproduce the issue, thank you very much for the feedback. To clarify, what happens here is the following:

This extra dependency is not useful as you mention. I will remove it in a patch release soon.

Galileo-Galilei commented 2 years ago

For anyone having the same issue, notice that you can now export a pipeline as a mlflow model with the kedro mlflow modelify command.