This package provides tools to export and import MLflow objects (runs, experiments or registered models) from one MLflow tracking server (Databricks workspace) to another. See the Databricks MLflow Object Relationships slide deck.
Source tracking server | Destination tracking server | Note |
---|---|---|
Open source | Open source | common |
Open source | Databricks | less common |
Databricks | Databricks | common |
Databricks | Open source | rare |
notebooks
folder under the run's artifacts
root.notebook-formats
- If exporting a Databricks run, the run's notebook revision can be saved in the specified formats (comma-delimited argument). Each format is saved in the notebooks folder of the run's artifact root directory as notebook.{format}
. Supported formats are SOURCE, HTML, JUPYTER and DBC. See Databricks Export Format documentation.
use-src-user-id
- Set the destination user ID to the source user ID. Source user ID is ignored when importing into Databricks since the user is automatically picked up from your Databricks access token.
export-metadata-tags
- Creates metadata tags (starting with mlflow_export_import.metadata
) that contain export information. These are the source mlflow
tags in addition to other information. This is useful for provenance and auditing purposes in regulated industries.
Name Value
mlflow_export_import.metadata.timestamp 1551037752
mlflow_export_import.metadata.timestamp_nice 2019-02-24 19:49:12
mlflow_export_import.metadata.experiment_id 2
mlflow_export_import.metadata.experiment-name sklearn_wine
mlflow_export_import.metadata.run-id 50fa90e751eb4b3f9ba9cef0efe8ea30
mlflow_export_import.metadata.tracking_uri http://localhost:5000
Supports python 3.7.6 or above.
First create a virtual environment.
python -m venv mlflow-export-import
source mlflow-export-import/bin/activate
There are two different ways to install the package.
pip install git+https:///github.com/amesar/mlflow-export-import/#egg=mlflow-export-import
git clone https://github.com/amesar/mlflow-export-import
cd mlflow-export-import
pip install -e .
There are two different ways to install the package.
Install notebook-scoped libraries with %pip.
pip install git+https:///github.com/amesar/mlflow-export-import/#egg=mlflow-export-import
Build the wheel artifact, upload it to DBFS and then install it on your cluster.
python setup.py bdist_wheel
databricks fs cp dist/mlflow_export_import-1.0.0-py3-none-any.whl {MY_DBFS_PATH}
To run the tools externally (from your laptop) against a Databricks tracking server (workspace) set the following environment variables.
export MLFLOW_TRACKING_URI=databricks
export DATABRICKS_HOST=https://mycompany.cloud.databricks.com
export DATABRICKS_TOKEN=MY_TOKEN
For full details see Access the MLflow tracking server from outside Databricks.
The main tool scripts can be executed either as a standard Python script or console script.
Python console scripts (such as export-run, import-run, etc.) are provided as a convenience. For a list of scripts see setup.py.
This allows you to use:
export-experiment --help
instead of:
python -u -m mlflow_export_import.experiment.export_experiment --help