Closed turn1a closed 3 years ago
I though about it at first, but I had some troubles with it because the run_params are not properly recorded in a class, so wecannot access to their defaults value and access only the invoked ones. Kedro
has a @fixme in the code for hooks implementation...
If we want to keep only passed arguments, we need to:
kedro-mlflow
to compare them to passed values (that is what I do to generate the kedro command https://github.com/Galileo-Galilei/kedro-mlflow/blob/28e60e035b3f3f91e6cd9f9aabdec181a5b289ca/kedro_mlflow/framework/hooks/pipeline_hook.py#L130-L154, but I will improve it a part of #11). This would make the plugin sensitive to any changes in the default values (which should not happen very often though) and when some new args are added from times to times (see 0.16.4 and hooks), making retro compatibility difficult. I may be wrong and you may have another and better solution.Actually, I know that we could overcome this issue by inspecting the signature of the kedro run command to get the default params, but I am not a huge fan of this solution...
I suggest that we open an issue to Kedro
directly to to address the issue of run_params
management first before we make any change to kedro-mlflow
. Are you ok with this?
Since this issue is not a critical change, I don't think we should tackle this right now (there are many other issues more important IMHO), do you agree?
What I mean is the following change:
diff --git a/kedro_mlflow/framework/hooks/pipeline_hook.py b/kedro_mlflow/framework/hooks/pipeline_hook.py
index 01d5522..534b80e 100644
--- a/kedro_mlflow/framework/hooks/pipeline_hook.py
+++ b/kedro_mlflow/framework/hooks/pipeline_hook.py
@@ -67,7 +67,7 @@ class MlflowPipelineHook:
run_name=run_name,
nested=mlflow_conf.run_opts["nested"],
)
- mlflow.set_tags(run_params)
+ mlflow.set_tags({k: v for k, v in run_params.items() if v})
# add manually git sha for consistency with the journal
# TODO : this does not take into account not committed files, so it
# does not ensure reproducibility. Define what to do.
In this way when you run for example kedro run --pipeline train
you end up with the following tags:
Is there any reason we couldn't do it like that? I might be missing something so please correct me if I'm wrong.
Ok for this! Avoiding to store None
value totally makes sense. I was thinking of something more difficult with an attempt to define whether a value was the default one or not (for instance you may want to avoid logging the environment above because "local" is the default value).
Currently, all possible CLI arguments and options are logged as tags even if they have no values (they weren't specified). I think this is unnecessary and we should only set the tags for the passed arguments and options. This would make tracking more readable. @Galileo-Galilei what do you think?