Galileo-Galilei / kedro-mlflow

A kedro-plugin for integration of mlflow capabilities inside kedro projects (especially machine learning model versioning and packaging)
https://kedro-mlflow.readthedocs.io/
Apache License 2.0
203 stars 34 forks source link

Can't publish MlflowMetricsHistoryDataset to Remote tracking server #582

Closed cariveroco closed 1 month ago

cariveroco commented 2 months ago

Description

Kedro pipeline run can't publish objects of type kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset to a remote Mlflow tracking server.

Context

A kedro pipeline that can successfully publish a kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset to a local Mlflow tracking server, is throwing an error when trying to pulish to a remote server. Previously, the same pipeline can successfully publish the metrics to both local and remote servers when the metrics was still configured to be of type kedro_mlflow.io.metrics.MlflowMetricsDataSet in kedro-mlflow v.0.11.10.

Based on the errors thrown, this may be related to this bug, where the suspected cause is that the get_all_metrics method is implemented for FileStore (local tracking server) but not for RestStore (remote tracking server).

Steps to Reproduce

  1. Create a kedro project with a pipeline that produces a metrics object that is configured to be of type kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset.
  2. Point the mlflow.yml file to a remote Mlflow tracking server.
  3. Run the pipeline.

Expected Result

The pipeline execution is completed successfully, and objects configured to be of type kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset are successfully published to the remote Mlflow tracking server.

Actual Result

The pipeline execution is completed successfully, but the run still throws back an error and can't publish the kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset to the remote Mlflow tracking server. The error does not happen when running the same code (on exactly the same environment) with a local tracking server.

-- If you received an error, place it here.

[08/20/24 08:36:43] INFO     Completed 42 out of 42 tasks                                 sequential_runner.py:90
                    INFO     Pipeline execution completed successfully.                             runner.py:119
Traceback (most recent call last):
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/io/core.py", line 291, in exists
    return self._exists()
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro_mlflow/io/metrics/mlflow_metrics_history_dataset.py", line 122, in _exists
    all_metrics = client._tracking_client.store.get_all_metrics(
AttributeError: 'RestStore' object has no attribute 'get_all_metrics'
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/bin/kedro", line 8, in <module>
    sys.exit(main())
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/framework/cli/cli.py", line 233, in main
    cli_collection()
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/framework/cli/cli.py", line 130, in main
    super().main(
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/framework/cli/project.py", line 225, in run
    session.run(
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/framework/session/session.py", line 408, in run
    hook_manager.hook.after_pipeline_run(
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_manager.py", line 480, in traced_hookexec
    return outcome.get_result()
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_result.py", line 62, in from_call
    result = func()
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_manager.py", line 477, in <lambda>
    lambda: oldcall(hook_name, hook_impls, caller_kwargs, firstresult)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro_mlflow/framework/hooks/mlflow_hook.py", line 365, in after_pipeline_run
    catalog.exists(dataset)
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/io/data_catalog.py", line 575, in exists
    return dataset.exists()
  File "/layers/dap-buildpacks_pip-install/site-packages/virtual-env/lib/python3.10/site-packages/kedro/io/core.py", line 296, in exists
    raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed during exists check for data set MlflowMetricsHistoryDataset(prefix=).
'RestStore' object has no attribute 'get_all_metrics'

Your Environment

Does the bug also happen with the last version on master?

The bug previously does not exist with the following setup:

Galileo-Galilei commented 2 months ago

Indeed, thanks for raising this issue. I guess the right way is to use get_metric_history which seems implemented in all stores: https://github.com/search?q=repo%3Amlflow%2Fmlflow+get_metric_history&type=code

mck-star-yar commented 1 month ago

Just faced the same issue; pinning mlflow version to earlier one doesn't work due to compatibility with py3.10

Galileo-Galilei commented 1 month ago

I'll try to take a look at it next week. This is a bug and it should be corrected quickly. Thank you for your patience.

Galileo-Galilei commented 1 month ago

Hi, @cariveroco @mck-star-yar can you test pip install git+https://github.com/Galileo-Galilei/kedro-mlflow.git@582-metrics_history-dataset-to-server and tell me if it fixes the issue?

cariveroco commented 1 month ago

Hi @Galileo-Galilei, it's working now on my end. Thank you very much!