mlflow models serve -m crashes

Chouffe commented 2 years ago

Following the tutorial and trying to serve the model with MLflow

mlflow models serve -m "runs:/ecac2d248e3b44719f9f0b662317b2c2/kedro_mlflow_tutorial"

I run into this error:

ModuleNotFoundError: No module named 'mlflow'
[2021-09-09 13:18:14 +0200] [171877] [INFO] Worker exiting (pid: 171877)
[2021-09-09 13:18:14 +0200] [171871] [INFO] Shutting down: Master
[2021-09-09 13:18:14 +0200] [171871] [INFO] Reason: Worker failed to boot.
Traceback (most recent call last):
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/bin/mlflow", line 11, in <module>
    sys.exit(cli())
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/mlflow/models/cli.py", line 56, in serve
    ).serve(model_uri=model_uri, port=port, host=host)
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/mlflow/pyfunc/backend.py", line 92, in serve
    conda_env_path, command, self._install_mlflow, command_env=command_env
  File "/home/chouffe/anaconda3/envs/kedro_mlflow_tutorial/lib/python3.7/site-packages/mlflow/pyfunc/backend.py", line 173, in _execute_in_conda_env
    "Command '{0}' returned non zero return code. Return code = {1}".format(command, rc)
Exception: Command 'source /home/chouffe/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-3ba24628c72d459b1b6beb8ed68ea4d497b882ff 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app' returned non zero return code. Return code = 3

I am not sure what is wrong with my setup as I clearly have mlflow installed in my conda environment.

Galileo-Galilei commented 2 years ago

Can you tell me if it solved once you have tried what is described in #8?

Chouffe commented 2 years ago

I investigated a little bit more the issue and here are my findings. The command I run to serve the mlflow model is the following:

mlflow models serve -m "runs:/79e6825b454e43dbbb8e9cc5fc8fdcf7/kedro_mlflow_tutorial"

The run_id is obtained from the mlflow ui. It starts by throwing errors such as the one I mentioned in the ticket. I was able to fix it by doing the following:

# MLflow seems to create a conda environment under the hood to serve them
# The first step is to activate the conda environment used by MLflow
source /home/chouffe/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-3ba24628c72d459b1b6beb8ed68ea4d497b882ff

# Then, one needs to reinstall all dependencies in this conda environment
pip install -e src/.

Now all the dependencies are installed properly in the MLflow generated conda environment and serving the model should work.

What is wrong with the current kedro project setup? How can one tell MLflow to install the requirements.txt dependencies when serving the model?

Edit: I checked the mlflow UI again and this is the conda.yml file I found stored as an artifact

pip:
- kedro_mlflow_tutorial==0.1
python: 3.7

Galileo-Galilei commented 2 years ago

First of all, I think we need to clarify what a "mlflow model" is (either the natives one or the custom one as we are using here). A mlflow model is a folder with the following structure:

An "artifacts" folder containing all the data needed to run your predictions (for standards models, there is often only the model. For KedroPipelineModel of kedro mlflow, all the inputs of your pipeline (execept the "instances" are stored here, especially to perform pre/post processing)
a python_model.pkl file which contains an instance of the MlflowModel class, here an instance of KedroPipelineModel. In our case, this contains your inference pipeline + the catalog + a kedro runner (i.e. everything you need to run your pipeline).
a conda.yaml file which contains the environment your code should be executed in (as you noticed)
a MLmodel file, which is a technical mlflow file which specifies:
- how to load the python_model.pkl object (what mlflow calls the "flavor" of this object)
- the path to the artifacts to be able to load them*
- extra informations about serving (should we validate the data, what is the schema of the input...)

You can see a picture of this folder in the tutorial.

With this context in mind, here is what is going on when you call the mlflow models serve command:

Mlflow create the environment where you code will be run, and install all necessary packages (=the ones specified in conda.yaml) inside it.
Mlflow activates the environment, and load the python_model.pkl file, i.e. it loads in memory the instance of your inference pipeline objects
It calls the load_context method of this object, which in our case loads puts all the artifact as MemoryDataSet inside Kedro's DataCatalog
It calls the predict method of this object (in our case, run the Kedro pipeline)

During step 4 (i.e. while running the pipeline) it likely imports somes dependencies, either external to your project (if you node has import pandas, you obviously need to have pandas installed) or internal to your project (if you have a from my_awesome_project.pipelines.nodes import my_awesome_function import, you will need to have your own Kedro project installed as a python package). This is very intuitive : if you were not using mlflow and sent a "my_pipeline.pkl" object to a coworker, you will need to give him/her both your codes with your functions AND the requirements of your project. There is no reason to expect that mlflow will automagically be able to work without these informations.

kedro-mlflow tries to automate the creation of all needed elements when it creates a custom model wit your inference pipeline:

it resolves all the artifacts as the inputs needed for your inference pipeline and persist them if needed
it eventually gets the schema of the inputs data
it retrieves the catalog and the inference pipeline from your Kedro project

The only thing it cannot resolve easily are the dependencies need for our project. Performing a pip freeze of your current environment is highly discouraged because some packages relies on external tools and need to be installed with conda (e.g. tesseract), some packages are os-dependent (e.g. pywin32). You need to specify this enviroment manually. Furthermore, it will not help you to distribute the code of the nodes of your project (if <my-kedro-project> is not available on PyPI, mlflow will not be able to install it.

Making your project a python package helps to solve both problems at the same time (i.e. when you pip install src/ your kedro project, you install your project as a package (and makes its functions importable, and you install its dependencies declared in setup.py). This is why I recommend this solution.

When deploying the project, you have 2 solutions:

Either you create the environment manually, install whatever you need inside it (including your kedro project), and then run mlflow models serve --no-conda -m "..." with the --no-conda flag to ignore the conda.yml stored inside mlflow.
Let mlflow install everything itself (without the --no-conda flag, as you are currently doing), but when running pip install, it will look for your package on PyPI (or whatever your repository manager is, it may different in an enterprise setup). Obviously, if you have not deployed your kedro project to PyPI it will fail and raise an error. This second option is recommended in an enterprise setup if you have an internal PyPI and a CI which deploys your project before you try to serve it.

Galileo-Galilei / kedro-mlflow-tutorial

mlflow models serve -m crashes #6