Handling param limit exceeded error

crypdick commented 3 years ago

I have a node which takes an S3 artifact URL as an input. This broke my pipeline because of an mlflow.exceptions.MlflowException: Param value 's3://very/long/string' had length 593, which exceeded length limit of 250.

IMO there should at least be the option to truncate long param strings.

turn1a commented 3 years ago

@crypdick This is a limitation imposed by MLflow and I think we shouldn't change that behaviour.

Galileo-Galilei commented 3 years ago

I slightly disagree with @kaemo here: I agree we should obviously align with mlflow and we will not support a specific trick to enable logging above this limit, but we still have to manage this situation in the plugin. Indeed, if someone still use a parameter above this limit I don't think the best solution is "do not use the plugin at all".

I can see 2 situations (in my personal experience) where too long parameters are used:

when you pass a dict which contains a lot of keys as a parameter (typically, all the hyperparameters of a ml model like xgboost). => This situation is already managed with the flatten_dict_params argument of MlflowNodeHook
when you pass a path to a file / an url to a node (your situation here), which can be very long. This situation is seen as a bad practice in Kedro, because it means you want to perform some I/O operations and they should be addressed in the DataCatalog (if you want to implement your own logic and no Kedro dataset suits you, they recommend to create your own dataset for clear separation with compute / better portability. However, I acknowledge that I sometimes encountered situations where I preferred doing this operation inside a node because the logic was very specific and not reusable elsewhere.

Potential solution

A possible solution I would support would be adding a long_parameters_strategy in the mlflow.yml, in the section:

hooks:
    node:
        long_parameters_strategy

This could take the following values:

fail: raise an error in the pipeline like current situation.
truncate: truncate the string by logging param[0:250]
tag: log the string with set_tag instead of log_param if it is above the limit, as it seems to be the recommended way in mlflow: mlflow/mlflow#1976

And then:

Add the same key long_parameters_strategy in the KedroMlflowConfig class: https://github.com/Galileo-Galilei/kedro-mlflow/blob/94bae3df9a054c85dfc0bf13de8db876363de475/kedro_mlflow/framework/context/config.py#L20 and modify accordingly the methods calls
modify the before_pipeline_run method of the MlflowNodeHook to replace this line and add the corresponding logical tests to address all possible situations.

https://github.com/Galileo-Galilei/kedro-mlflow/blob/94bae3df9a054c85dfc0bf13de8db876363de475/kedro_mlflow/framework/hooks/node_hook.py#L53

Some points to have in mind:

a logger should put a warning in the log because it prevents reproducibility
the logical test on length must use MAX_PARAM_VAL_LENGTH (and not hardcoded 250 value to ensure we will adapt in the future.

I don't have time right now and it is not in my top priorities for the plugin, but I will definitely address this in the coming months (llikely by the end of november). If you are in hurry @crypdick, feel free to open a PR 😉. I am open to discussion about the best way to address this, so do not hesitate to suggest alternative possibilities.

crypdick commented 3 years ago

@Galileo-Galilei ty for the ideas! Indeed, I had to disable the MlflowNodeHook, which was a bummer.

In the interest of time, I'm going to port the full URIs into a YAMLDataSet, and keep just the pointers in the parameters.yml.

turn1a commented 3 years ago

@crypdick @Galileo-Galilei I might have assesed this issue a bit too early. I agree with the possible solution you have outlined.

Galileo-Galilei commented 3 years ago

Hello @crypdick, I am struggling a little with this issue. I have implemented a PR for this, and while I am adding tests I can't reproduce the issue. Actually the following code works without any issue:

import mlflow
with mlflow.start_run():
    mlflow.log_param("fake_param", "this is a very long string"*1000)

This logs and render in the UI. I suspect your issue might be related to remote logging to a server. I created a server on my localhost, and it stil works (logs without error).

I have some additionals questions:

What version of mlflow do you have?
Can you provide a minimal reproducible example? Does above code works for you?
Can you check if it works locally (wih a mlruns folder) and remotely (on your S3 tracking)?

I'll try to make somme additional tests on my side.

crypdick commented 3 years ago

@Galileo-Galilei I'm super sorry but I am under an avalanche of work right now and don't have any spare bandwidth

Galileo-Galilei / kedro-mlflow

Handling param limit exceeded error #69

Potential solution