Closed Chouffe closed 2 years ago
Can you tell me if it solved once you have tried what is described in #8?
I investigated a little bit more the issue and here are my findings. The command I run to serve the mlflow model is the following:
mlflow models serve -m "runs:/79e6825b454e43dbbb8e9cc5fc8fdcf7/kedro_mlflow_tutorial"
The run_id
is obtained from the mlflow ui.
It starts by throwing errors such as the one I mentioned in the ticket.
I was able to fix it by doing the following:
# MLflow seems to create a conda environment under the hood to serve them
# The first step is to activate the conda environment used by MLflow
source /home/chouffe/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-3ba24628c72d459b1b6beb8ed68ea4d497b882ff
# Then, one needs to reinstall all dependencies in this conda environment
pip install -e src/.
Now all the dependencies are installed properly in the MLflow generated conda environment and serving the model should work.
What is wrong with the current kedro project setup? How can one tell MLflow to install the requirements.txt dependencies when serving the model?
Edit: I checked the mlflow UI again and this is the conda.yml file I found stored as an artifact
pip:
- kedro_mlflow_tutorial==0.1
python: 3.7
First of all, I think we need to clarify what a "mlflow model" is (either the natives one or the custom one as we are using here). A mlflow model is a folder with the following structure:
KedroPipelineModel
of kedro mlflow, all the inputs of your pipeline (execept the "instances" are stored here, especially to perform pre/post processing)python_model.pkl
file which contains an instance of the MlflowModel class, here an instance of KedroPipelineModel
. In our case, this contains your inference pipeline + the catalog + a kedro runner (i.e. everything you need to run your pipeline).conda.yaml
file which contains the environment your code should be executed in (as you noticed)MLmodel
file, which is a technical mlflow file which specifies:
python_model.pkl
object (what mlflow calls the "flavor" of this object)You can see a picture of this folder in the tutorial.
With this context in mind, here is what is going on when you call the mlflow models serve
command:
python_model.pkl
file, i.e. it loads in memory the instance of your inference pipeline objectsload_context
method of this object, which in our case loads puts all the artifact
as MemoryDataSet
inside Kedro's DataCatalog
predict
method of this object (in our case, run the Kedro pipeline)During step 4 (i.e. while running the pipeline) it likely imports somes dependencies, either external to your project (if you node has import pandas
, you obviously need to have pandas installed) or internal to your project (if you have a from my_awesome_project.pipelines.nodes import my_awesome_function
import, you will need to have your own Kedro project installed as a python package). This is very intuitive : if you were not using mlflow and sent a "my_pipeline.pkl" object to a coworker, you will need to give him/her both your codes with your functions AND the requirements of your project. There is no reason to expect that mlflow will automagically be able to work without these informations.
kedro-mlflow
tries to automate the creation of all needed elements when it creates a custom model wit your inference pipeline:
The only thing it cannot resolve easily are the dependencies need for our project. Performing a pip freeze
of your current environment is highly discouraged because some packages relies on external tools and need to be installed with conda (e.g. tesseract
), some packages are os-dependent (e.g. pywin32
). You need to specify this enviroment manually. Furthermore, it will not help you to distribute the code of the nodes of your project (if <my-kedro-project>
is not available on PyPI, mlflow will not be able to install it.
Making your project a python package helps to solve both problems at the same time (i.e. when you pip install src/
your kedro project, you install your project as a package (and makes its functions importable, and you install its dependencies declared in setup.py
). This is why I recommend this solution.
When deploying the project, you have 2 solutions:
mlflow models serve --no-conda -m "..."
with the --no-conda
flag to ignore the conda.yml stored inside mlflow.--no-conda
flag, as you are currently doing), but when running pip install, it will look for your package on PyPI (or whatever your repository manager is, it may different in an enterprise setup). Obviously, if you have not deployed your kedro project to PyPI it will fail and raise an error. This second option is recommended in an enterprise setup if you have an internal PyPI and a CI which deploys your project before you try to serve it.
Following the tutorial and trying to serve the model with MLflow
I run into this error:
I am not sure what is wrong with my setup as I clearly have mlflow installed in my conda environment.