Galileo-Galilei / kedro-mlflow

A kedro-plugin for integration of mlflow capabilities inside kedro projects (especially machine learning model versioning and packaging)
https://kedro-mlflow.readthedocs.io/
Apache License 2.0
195 stars 29 forks source link

Handling param limit exceeded error #69

Closed crypdick closed 3 years ago

crypdick commented 3 years ago

I have a node which takes an S3 artifact URL as an input. This broke my pipeline because of an mlflow.exceptions.MlflowException: Param value 's3://very/long/string' had length 593, which exceeded length limit of 250.

IMO there should at least be the option to truncate long param strings.

turn1a commented 3 years ago

@crypdick This is a limitation imposed by MLflow and I think we shouldn't change that behaviour.

Galileo-Galilei commented 3 years ago

I slightly disagree with @kaemo here: I agree we should obviously align with mlflow and we will not support a specific trick to enable logging above this limit, but we still have to manage this situation in the plugin. Indeed, if someone still use a parameter above this limit I don't think the best solution is "do not use the plugin at all".

I can see 2 situations (in my personal experience) where too long parameters are used:

Potential solution

A possible solution I would support would be adding a long_parameters_strategy in the mlflow.yml, in the section:

hooks:
    node:
        long_parameters_strategy

This could take the following values:

And then:

  1. Add the same key long_parameters_strategy in the KedroMlflowConfig class: https://github.com/Galileo-Galilei/kedro-mlflow/blob/94bae3df9a054c85dfc0bf13de8db876363de475/kedro_mlflow/framework/context/config.py#L20 and modify accordingly the methods calls
  2. modify the before_pipeline_run method of the MlflowNodeHook to replace this line and add the corresponding logical tests to address all possible situations.

https://github.com/Galileo-Galilei/kedro-mlflow/blob/94bae3df9a054c85dfc0bf13de8db876363de475/kedro_mlflow/framework/hooks/node_hook.py#L53

Some points to have in mind:

I don't have time right now and it is not in my top priorities for the plugin, but I will definitely address this in the coming months (llikely by the end of november). If you are in hurry @crypdick, feel free to open a PR 😉. I am open to discussion about the best way to address this, so do not hesitate to suggest alternative possibilities.

crypdick commented 3 years ago

@Galileo-Galilei ty for the ideas! Indeed, I had to disable the MlflowNodeHook, which was a bummer.

In the interest of time, I'm going to port the full URIs into a YAMLDataSet, and keep just the pointers in the parameters.yml.

turn1a commented 3 years ago

@crypdick @Galileo-Galilei I might have assesed this issue a bit too early. I agree with the possible solution you have outlined.

Galileo-Galilei commented 3 years ago

Hello @crypdick, I am struggling a little with this issue. I have implemented a PR for this, and while I am adding tests I can't reproduce the issue. Actually the following code works without any issue:

import mlflow
with mlflow.start_run():
    mlflow.log_param("fake_param", "this is a very long string"*1000)

This logs and render in the UI. I suspect your issue might be related to remote logging to a server. I created a server on my localhost, and it stil works (logs without error).

I have some additionals questions:

I'll try to make somme additional tests on my side.

crypdick commented 3 years ago

@Galileo-Galilei I'm super sorry but I am under an avalanche of work right now and don't have any spare bandwidth