NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
715 stars 111 forks source link

Use MLflow Experiments with Merlin containers[QST] #1073

Closed hkristof03 closed 6 months ago

hkristof03 commented 9 months ago

❓ Questions & Help

Hi everyone,

I am using the Tensorflow pre-built container to train Ranking models. I am trying to use MLflow to properly log the hyperparameters, results and artifacts but I am not able to install it. I saw it appearing for example here, but I haven't found an example and I could not install it in the container.

Details

Dockerfile:

FROM nvcr.io/nvidia/merlin/merlin-tensorflow:23.09

WORKDIR /workspace

RUN pip install mlflow

CMD ["jupyter-lab", "--allow-root", "--ip='0.0.0.0'", "--NotebookApp.token=''", "--no-browser"]

Error:

#0 9.122     Found existing installation: blinker 1.4
#0 9.123 ERROR: Cannot uninstall 'blinker'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

I tried to install older MLflow versions but all attempts resulted in the error above. Is there a way to install MLflow in these pre-built containers and use it with Merlin? If it is already supported in a way, could you link the code / documentation / notebook?

hkristof03 commented 8 months ago

@rnyak could you comment on this issue, please, or could you recommend a solution?

EvenOldridge commented 7 months ago

Hi @hkristof03 , unfortunately we don't have the bandwidth right now to support dependency interactions. Is the reason it's trying to uninstall because there's a conflict with the version of blinker? Are you able to update it manually to a newer version that's compatible with what MLflow requires?

nv-alaiacano commented 7 months ago

Try doing a (slightly unsafe) forced upgrade of blinker without first uninstalling it:

FROM nvcr.io/nvidia/merlin/merlin-tensorflow:23.09

WORKDIR /workspace

RUN pip install --ignore-installed blinker
RUN pip install mlflow

CMD ["jupyter-lab", "--allow-root", "--ip='0.0.0.0'", "--NotebookApp.token=''", "--no-browser"]

I was able to build this container and launch jupyterlab.

Inside jupyterlab I opened a terminal and started mlflow

$ mlflow server

Then create an experiment/run and log params successfully:

import mlflow

mlflow.set_tracking_uri("http://localhost:5000")

mlflow.create_experiment("my experiment")

with mlflow.start_run() as run:
    mlflow.log_param("log_paramsey", "value")
rnyak commented 6 months ago

@hkristof03 I am closing this issue. Please see the solution idea from @nv-alaiacano above. if you need to reopen this ticket, pls do so.

pklemenkov commented 6 months ago

This solution doesn't work as soon as you start using Tensorflow, because installation of that kind breaks pandas. File "/usr/local/lib/python3.10/dist-packages/merlin/models/tf/__init__.py", line 20, in <module> from merlin.dataloader.tf_utils import configure_tensorflow File "/usr/local/lib/python3.10/dist-packages/merlin/dataloader/tf_utils.py", line 24, in <module> from merlin.core.dispatch import HAS_GPU File "/usr/local/lib/python3.10/dist-packages/merlin/core/dispatch.py", line 21, in <module> import dask.dataframe as dd File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/__init__.py", line 4, in <module> from dask.dataframe import backends, dispatch, rolling File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/backends.py", line 22, in <module> from dask.dataframe.core import DataFrame, Index, Scalar, Series, _Frame File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/core.py", line 35, in <module> from dask.dataframe import methods File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/methods.py", line 22, in <module> from dask.dataframe.utils import is_dataframe_like, is_index_like, is_series_like File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/utils.py", line 19, in <module> from dask.dataframe import ( # noqa: F401 register pandas extension types File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/_dtypes.py", line 4, in <module> from dask.dataframe.extensions import make_array_nonempty, make_scalar File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/extensions.py", line 6, in <module> from dask.dataframe.accessor import ( File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/accessor.py", line 190, in <module> class StringAccessor(Accessor): File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/accessor.py", line 276, in StringAccessor pd.core.strings.StringMethods, AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'