Netflix / metaflow-service

:rocket: Metadata tracking and UI service for Metaflow!
http://www.metaflow.org
Apache License 2.0
191 stars 71 forks source link
ai data-science machine-learning metaflow ml ml-infrastructure ml-platform productivity ui

Metaflow Service

Metadata service implementation for Metaflow.

This provides a thin wrapper around a database and keeps track of metadata associated with metaflow entities such as Flows, Runs, Steps, Tasks, and Artifacts.

For more information, see Metaflow's admin docs

Getting Started

The service depends on the following Environment Variables to be set:

Optionally you can also overrider the host and port the service runs on

Create triggers to broadcast any database changes via pg_notify on channel NOTIFY:

pip3 install ./
python3 -m services.metadata_service.server

Swagger UI: http://localhost:8080/api/doc

Using docker-compose

Easiest way to run this project is to use docker-compose and there are two options:

Running docker-compose.yml:

docker-compose up -d

Running docker-compose.development.yml (recommended during development):

docker-compose -f docker-compose.development.yml up

to access the container run

docker exec -it metadata_service /bin/bash

within the container curl the service directly

curl localhost:8080/ping

Using published image on DockerHub

Latest release of the image is available on dockerhub

docker pull netflixoss/metaflow_metadata_service

Be sure to set the proper env variables when running the image

docker run -e MF_METADATA_DB_HOST='<instance_name>.us-east-1.rds.amazonaws.com' \
-e MF_METADATA_DB_PORT=5432 \
-e MF_METADATA_DB_USER='postgres' \
-e MF_METADATA_DB_PSWD='postgres' \
-e MF_METADATA_DB_NAME='metaflow' \
-it -p 8082:8082 -p 8080:8080 metaflow_metadata_service

Running tests

Tests are run using Tox and pytest.

Run following command to execute tests in Dockerized environment:

docker-compose -f docker-compose.test.yml up -V --abort-on-container-exit

Above command will make sure there's PostgreSQL database available.

Usage without Docker:

The test suite requires a PostgreSQL database, along with the following environment variables for connecting the tested services to the DB.

# Run all tests
tox

# Run unit tests only
tox -e unit

# Run integration tests only
tox -e integration

# Run both unit & integrations tests in parallel
tox -e unit,integration -p

Executing flows against a local Metadata service

With the metadata service up and running at http://localhost:8080, you are able to use this as the service when executing Flows with the Metaflow client locally via

METAFLOW_SERVICE_URL=http://localhost:8080 METAFLOW_DEFAULT_METADATA="service" python3 basicflow.py run

Alternatively you can configure a default profile with the service URL for the Metaflow client to use. See Configuring metaflow for instructions.

Migration Service

The Migration service is a tool to help users manage underlying DB migrations and launch the most recent compatible version of the metadata service

Note that it is possible to run the two services independently and a Dockerfile is supplied for each service. However the default Dockerfile combines the two services.

Also note that at runtime the migration service and the metadata service are completely disjoint and do not communicate with each other

Migrating to the latest db schema

Note may need to do a rolling restart to get latest version of the image if you don't have it already

You can manage the migration either via the api provided or with the utility cli provided with migration_tools.py

Under the Hood: What is going on in the Docker Container

Within the published metaflow_metadata_service image the migration service is packaged along with the latest version of the metadata service compatible with every version of the db. This means that multiple versions of the metadata service comes bundled with the image, each is installed under a different virtual env.

When the container spins up, the migration service is launched first and determines what virtualenv to activate depending on the schema version of the DB. This will determine which version of the metadata service will run.

Release

See the release docs

Get in Touch

There are several ways to get in touch with us: