Galileo-Galilei / kedro-mlflow-tutorial

A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and serve kedro pipeline
37 stars 5 forks source link

mlflow models serve -m crashes #6

Closed Chouffe closed 2 years ago

Chouffe commented 2 years ago

Following the tutorial and trying to serve the model with MLflow

mlflow models serve -m "runs:/ecac2d248e3b44719f9f0b662317b2c2/kedro_mlflow_tutorial"

I run into this error:

ModuleNotFoundError: No module named 'mlflow'
[2021-09-09 13:18:14 +0200] [171877] [INFO] Worker exiting (pid: 171877)
[2021-09-09 13:18:14 +0200] [171871] [INFO] Shutting down: Master
[2021-09-09 13:18:14 +0200] [171871] [INFO] Reason: Worker failed to boot.
Traceback (most recent call last):
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/bin/mlflow", line 11, in <module>
    sys.exit(cli())
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/mlflow/models/cli.py", line 56, in serve
    ).serve(model_uri=model_uri, port=port, host=host)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/mlflow/pyfunc/backend.py", line 92, in serve
    conda_env_path, command, self._install_mlflow, command_env=command_env
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/mlflow/pyfunc/backend.py", line 173, in _execute_in_conda_env
    "Command '{0}' returned non zero return code. Return code = {1}".format(command, rc)
Exception: Command 'source /home/chouffe/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-3ba24628c72d459b1b6beb8ed68ea4d497b882ff 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app' returned non zero return code. Return code = 3

I am not sure what is wrong with my setup as I clearly have mlflow installed in my conda environment.

Galileo-Galilei commented 2 years ago

Can you tell me if it solved once you have tried what is described in #8?

Chouffe commented 2 years ago

I investigated a little bit more the issue and here are my findings. The command I run to serve the mlflow model is the following:

mlflow models serve -m "runs:/79e6825b454e43dbbb8e9cc5fc8fdcf7/kedro_mlflow_tutorial"

The run_id is obtained from the mlflow ui. It starts by throwing errors such as the one I mentioned in the ticket. I was able to fix it by doing the following:

# MLflow seems to create a conda environment under the hood to serve them
# The first step is to activate the conda environment used by MLflow
source /home/chouffe/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-3ba24628c72d459b1b6beb8ed68ea4d497b882ff

# Then, one needs to reinstall all dependencies in this conda environment
pip install -e src/.

Now all the dependencies are installed properly in the MLflow generated conda environment and serving the model should work.

What is wrong with the current kedro project setup? How can one tell MLflow to install the requirements.txt dependencies when serving the model?

Edit: I checked the mlflow UI again and this is the conda.yml file I found stored as an artifact

pip:
- kedro_mlflow_tutorial==0.1
python: 3.7
Galileo-Galilei commented 2 years ago

First of all, I think we need to clarify what a "mlflow model" is (either the natives one or the custom one as we are using here). A mlflow model is a folder with the following structure:

You can see a picture of this folder in the tutorial.

With this context in mind, here is what is going on when you call the mlflow models serve command:

  1. Mlflow create the environment where you code will be run, and install all necessary packages (=the ones specified in conda.yaml) inside it.
  2. Mlflow activates the environment, and load the python_model.pkl file, i.e. it loads in memory the instance of your inference pipeline objects
  3. It calls the load_context method of this object, which in our case loads puts all the artifact as MemoryDataSet inside Kedro's DataCatalog
  4. It calls the predict method of this object (in our case, run the Kedro pipeline)

During step 4 (i.e. while running the pipeline) it likely imports somes dependencies, either external to your project (if you node has import pandas, you obviously need to have pandas installed) or internal to your project (if you have a from my_awesome_project.pipelines.nodes import my_awesome_function import, you will need to have your own Kedro project installed as a python package). This is very intuitive : if you were not using mlflow and sent a "my_pipeline.pkl" object to a coworker, you will need to give him/her both your codes with your functions AND the requirements of your project. There is no reason to expect that mlflow will automagically be able to work without these informations.

kedro-mlflow tries to automate the creation of all needed elements when it creates a custom model wit your inference pipeline:

The only thing it cannot resolve easily are the dependencies need for our project. Performing a pip freeze of your current environment is highly discouraged because some packages relies on external tools and need to be installed with conda (e.g. tesseract), some packages are os-dependent (e.g. pywin32). You need to specify this enviroment manually. Furthermore, it will not help you to distribute the code of the nodes of your project (if <my-kedro-project> is not available on PyPI, mlflow will not be able to install it.

Making your project a python package helps to solve both problems at the same time (i.e. when you pip install src/ your kedro project, you install your project as a package (and makes its functions importable, and you install its dependencies declared in setup.py). This is why I recommend this solution.

When deploying the project, you have 2 solutions: