Galileo-Galilei / kedro-mlflow

A kedro-plugin for integration of mlflow capabilities inside kedro projects (especially machine learning model versioning and packaging)
https://kedro-mlflow.readthedocs.io/
Apache License 2.0
194 stars 29 forks source link

Using MLflow Skinny instead of MLflow as the required dependency. #486

Open rxm7706 opened 8 months ago

rxm7706 commented 8 months ago

If you like the repo, please give it a :star:

Description

A clear and concise description of what you want to achieve. An image or a code example is worth thousand words!

With the introduction of ML-FLOW AI Gateway; ML-FLOW has become quite large in the number of dependencies. To manage the growth of the ML-FLOW ecosystem, MLFLOW-Skinny was introduced.

MLflow Skinny is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. MLflow Skinny supports:

    Tracking operations (logging / loading / searching params, metrics, tags + logging / loading artifacts)
    Model registration, search, artifact loading, and transitions
    Execution of GitHub projects within notebook & against a remote target.

conda install mlflow-skinny vs conda install mlflow is over 100 packages additional.

Context

Why is this change important to you? How would you use it? How can it benefit other users?

Currently an Open CVE on Pyarrow https://nvd.nist.gov/vuln/detail/CVE-2023-47248 is flagged on kedro-mlflow because it depends on ML-Flow ; because one of the additional dependencies uses pyarrow

Additional dependencies can be installed to leverage the full feature set of MLflow. For example:

    To use the mlflow.sklearn component of MLflow Models, install scikit-learn, numpy and pandas.
    To use SQL-based metadata storage, install sqlalchemy, alembic, and sqlparse.
    To use serving-based features, install flask and pandas.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

If its possible for kedro-mlflow to use mlflow-skinny, it might be a good idea to change the dependency from mlflow to mlflow-skinny and let users manage their dependencies with more granularity.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

do nothing - leave things the way they are , but kedro-mlflow becomes bloated as mlflow full grows.

Galileo-Galilei commented 8 months ago

This is a duplicate of https://github.com/Galileo-Galilei/kedro-mlflow/issues/344. It is quite old, so I will reassess it to see if we can make it work!

rxm7706 commented 7 months ago

Sorry I didn't see that issue earlier, I see your response explaining the associated effort it would take to make this change. Thank you & Regards.

Galileo-Galilei commented 6 months ago

Some good & bad news after testing:

Galileo-Galilei commented 5 months ago

Decision : I think the only way to make it work is to publish a kedro-mlflow-skinny separately on PyPI whose only difference with standard kedro-mlflow would be the replacement of the mlflow dependency by mlflow-skinny.

TODO :

PR's are welcome!