Add auto-registration for plugin hooks

Galileo-Galilei commented 4 years ago

This is a new feature discussed in quantumblacklabs/kedro#435 and recently added to master through https://github.com/quantumblacklabs/kedro/commit/49834dd8fe34721d940634ac5a9f93b90bb7b394.

This seems to be a better way to register the basic hooks to avoid messing up with the template. Some design decisions should be made:

Deal with retrocompatibility
Define where the arguments should be registered, likely in the mlflow.yml

Galileo-Galilei commented 4 years ago

In the current version (kedro-mlflow==0.2.1), the command kedro mlflow init modifies the src/<python-package>/run.py of the template of the project and adds to Hooks:

MlflowPipelineHook with 2 arguments, model_name and conda_env
MlflowNodeHook with 1 argument, flatten_dict_params

I feel that auto-registration would be much beter than messing up wih the project template because:

these Hooks are mandatory for the plugin to be used so they must be registered whatever happens
the kedro mlflow init command cannot modify the template if the user has made custom modifications in it, and therefore hook registration must be manual in this case
the kedro mlflow init command makes a check to see if the template has been modified, but the template vary between kedro versions, and it is hard to keep up (+we need to store different template versions).

As a consequence auto-registration would made maintenance much easier, increase compatibility with different kedro versions and improve user experience. I wish we integrate it. However:

we want to keep retrocompatibility with all kedro versions until 0.16. Basically, it means that we would keep the kedro mlflow init command to update the src/<python-package>/run.py but remove the message which ask to update the template if kedro version is above 0.16.4. Whatever, this message should be reworked because of #14. @kaemo @akruszewski do you think we should enforce retrocompatibility this way (at the price of extra code) or not ? We could just pin kedro>=0.16.4 in the requirement for kedro-mlflow above 0.3.0 for instance.
we need to decide what to do with the Hooks parameters because the user wil no longer access them. In my opinion:
- model_name and conda_env are only used if a PipelineML is registered. They have nothing to do in the Hook and they should be moved to pipeline_ml function. This would be more consistent to declare everything in the same place + it would enable to have several pipeline_ml in the same project (which is realistic).
- flatten_dict_params is used to flatten parameters which are Dict because mlflow has a maximum length for parameters and big parameters dictionnary may be too long (e.g hyperparameters={param1: x, param2: y} is logged as hyperparameters.param1 and hyperparameters.param2). I don't know what is the default behaviour to enforce: flattening can deal with all situations but modifies slightly the names used which may be confusing for the user. Apart from the default beahaviour, we should give a way to modify it to the user. Maybe we can add an entry in the mlflow.yml file? @kaemo @akruszewski what do you think?

turn1a commented 4 years ago

We've used automatic hook discovery feature in our (currently) inner source kedro-pandera plugin and it works well, although there's a bug that breaks kedro-viz which I submitted today so we should wait until it's resolved. When it's fixed we should definitely pursue automatic hook discovery route as it simplifies plugin setup massively.

In terms of retrocompatibility, I think that kedro-mlflow is at such an early stage at this point that I don't think it's needed. In my opinion, we should pin kedro version to the minimum required in 0.3.0 and make it clear in CHANGELOG that new version requires some migration activities for current users.

As for the PipelineML I agree that those parameters should go into pipeline_ml and multiple pipeline_ml's are definitely a realistic scenario which we should consider. flatten_dict_params in mlflow.yml sounds good too.

Galileo-Galilei commented 4 years ago

Thanks for the information regarding the kedro-viz bug, we'll wait until it is resolved.

Agreed with everything, I start separated PR.

Galileo-Galilei commented 3 years ago

https://github.com/quantumblacklabs/kedro-viz/issues/260 is now fixed, We can add this in the release.

Galileo-Galilei / kedro-mlflow

Add auto-registration for plugin hooks #29