Metadata service implementation for Metaflow.
This provides a thin wrapper around a database and keeps track of metadata associated with metaflow entities such as Flows, Runs, Steps, Tasks, and Artifacts.
For more information, see Metaflow's admin docs
The service depends on the following Environment Variables to be set:
Optionally you can also overrider the host and port the service runs on
Create triggers to broadcast any database changes via pg_notify
on channel NOTIFY
:
DB_TRIGGER_CREATE
metadata_service
defaults to 0]ui_backend_service
defaults to 1]pip3 install ./ python3 -m services.metadata_service.server
Swagger UI: http://localhost:8080/api/doc
Easiest way to run this project is to use docker-compose
and there are two options:
docker-compose.yml
docker build
section on how to pre-build the Docker imagesdocker-compose.development.yml
./services
folder inside the containerRunning docker-compose.yml
:
docker-compose up -d
Running docker-compose.development.yml
(recommended during development):
docker-compose -f docker-compose.development.yml up
:8080
.:8082
.:8083
.to access the container run
docker exec -it metadata_service /bin/bash
within the container curl the service directly
curl localhost:8080/ping
Latest release of the image is available on dockerhub
docker pull netflixoss/metaflow_metadata_service
Be sure to set the proper env variables when running the image
docker run -e MF_METADATA_DB_HOST='<instance_name>.us-east-1.rds.amazonaws.com' \ -e MF_METADATA_DB_PORT=5432 \ -e MF_METADATA_DB_USER='postgres' \ -e MF_METADATA_DB_PSWD='postgres' \ -e MF_METADATA_DB_NAME='metaflow' \ -it -p 8082:8082 -p 8080:8080 metaflow_metadata_service
Tests are run using Tox and pytest.
Run following command to execute tests in Dockerized environment:
docker-compose -f docker-compose.test.yml up -V --abort-on-container-exit
Above command will make sure there's PostgreSQL database available.
Usage without Docker:
The test suite requires a PostgreSQL database, along with the following environment variables for connecting the tested services to the DB.
# Run all tests tox # Run unit tests only tox -e unit # Run integration tests only tox -e integration # Run both unit & integrations tests in parallel tox -e unit,integration -p
With the metadata service up and running at http://localhost:8080
, you are able to use this as the service when executing Flows with the Metaflow client locally via
METAFLOW_SERVICE_URL=http://localhost:8080 METAFLOW_DEFAULT_METADATA="service" python3 basicflow.py run
Alternatively you can configure a default profile with the service URL for the Metaflow client to use. See Configuring metaflow for instructions.
The Migration service is a tool to help users manage underlying DB migrations and launch the most recent compatible version of the metadata service
Note that it is possible to run the two services independently and a Dockerfile is supplied for each service. However the default Dockerfile combines the two services.
Also note that at runtime the migration service and the metadata service are completely disjoint and do not communicate with each other
Note may need to do a rolling restart to get latest version of the image if you don't have it already
You can manage the migration either via the api provided or with the utility cli provided with migration_tools.py
/db_schema_status
python3 migration_tools.py db-status
is_up_to_date
should be false and a list of migrations to be applied
will be shown under unapplied_migrations
/upgrade
python3 migration_tools.py upgrade
/db_schema_status
python3 migration_tools.py db-status
is_up_to_date
should be set to True and migration_in_progress
should be set to Falsepython3 migration_tools.py metadata-service-version
Within the published metaflow_metadata_service image the migration service is packaged along with the latest version of the metadata service compatible with every version of the db. This means that multiple versions of the metadata service comes bundled with the image, each is installed under a different virtual env.
When the container spins up, the migration service is launched first and determines what virtualenv to activate depending on the schema version of the DB. This will determine which version of the metadata service will run.
See the release docs
There are several ways to get in touch with us: