Closed Galileo-Galilei closed 3 years ago
In the current version (kedro-mlflow==0.2.1
), the command kedro mlflow init
modifies the src/<python-package>/run.py
of the template of the project and adds to Hooks
:
MlflowPipelineHook
with 2 arguments, model_name
and conda_env
MlflowNodeHook
with 1 argument, flatten_dict_params
I feel that auto-registration would be much beter than messing up wih the project template because:
Hooks
are mandatory for the plugin to be used so they must be registered whatever happenskedro mlflow init
command cannot modify the template if the user has made custom modifications in it, and therefore hook registration must be manual in this casekedro mlflow init
command makes a check to see if the template has been modified, but the template vary between kedro
versions, and it is hard to keep up (+we need to store different template versions).As a consequence auto-registration would made maintenance much easier, increase compatibility with different kedro versions and improve user experience. I wish we integrate it. However:
kedro mlflow init
command to update the src/<python-package>/run.py
but remove the message which ask to update the template if kedro
version is above 0.16.4
. Whatever, this message should be reworked because of #14. @kaemo @akruszewski do you think we should enforce retrocompatibility this way (at the price of extra code) or not ? We could just pin kedro>=0.16.4
in the requirement for kedro-mlflow above 0.3.0 for instance.Hooks
parameters because the user wil no longer access them. In my opinion:
model_name
and conda_env
are only used if a PipelineML
is registered. They have nothing to do in the Hook and they should be moved to pipeline_ml
function. This would be more consistent to declare everything in the same place + it would enable to have several pipeline_ml in the same project (which is realistic). flatten_dict_params
is used to flatten parameters which are Dict
because mlflow has a maximum length for parameters and big parameters dictionnary may be too long (e.g hyperparameters={param1: x, param2: y} is logged as hyperparameters.param1
and hyperparameters.param2
). I don't know what is the default behaviour to enforce: flattening can deal with all situations but modifies slightly the names used which may be confusing for the user. Apart from the default beahaviour, we should give a way to modify it to the user. Maybe we can add an entry in the mlflow.yml
file? @kaemo @akruszewski what do you think?We've used automatic hook discovery feature in our (currently) inner source kedro-pandera plugin and it works well, although there's a bug that breaks kedro-viz which I submitted today so we should wait until it's resolved. When it's fixed we should definitely pursue automatic hook discovery route as it simplifies plugin setup massively.
In terms of retrocompatibility, I think that kedro-mlflow is at such an early stage at this point that I don't think it's needed. In my opinion, we should pin kedro version to the minimum required in 0.3.0 and make it clear in CHANGELOG that new version requires some migration activities for current users.
As for the PipelineML
I agree that those parameters should go into pipeline_ml
and multiple pipeline_ml
's are definitely a realistic scenario which we should consider. flatten_dict_params
in mlflow.yml
sounds good too.
Thanks for the information regarding the kedro-viz bug, we'll wait until it is resolved.
Agreed with everything, I start separated PR.
https://github.com/quantumblacklabs/kedro-viz/issues/260 is now fixed, We can add this in the release.
This is a new feature discussed in quantumblacklabs/kedro#435 and recently added to master through https://github.com/quantumblacklabs/kedro/commit/49834dd8fe34721d940634ac5a9f93b90bb7b394.
This seems to be a better way to register the basic hooks to avoid messing up with the template. Some design decisions should be made: