Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

Tutorial: Continuous Mastering #449

Closed pcattori closed 4 years ago

pcattori commented 4 years ago

🙋 feature request

We need a tutorial that shows how to keep a Mastering project up-to-date with new data/labels.

🔦 Context

Mastering projects are extremely common, but the workflow is complex. We need a guide to show users how to manage their existing mastering projects programmatically.

Tasks

skalish commented 4 years ago

Here is a foundation of some code that would probably be used for such a tutorial:

from getpass import getpass

import tamr_client as tc

username = input("Tamr Username:")
password = getpass("Tamr Password:")
auth = tc.UsernamePasswordAuth(username, password)

session = tc.session.from_auth(auth)
instance = tc.Instance(host="localhost", port=9100)

project_id = "1" # replace with your project ID
project = tc.project.from_resource_id(project_id)

def check(op: tc.Operation):
    if not tc.operation.succeeded(op):
        raise RuntimeError("Operation failed.")
    return op

check(tc.mastering.update_unified_dataset(session, project))

check(tc.mastering.generate_pairs(session, project))

check(tc.mastering.apply_feedback(session, project))

check(tc.mastering.update_pair_results(session, project))

check(tc.mastering.update_high_impact_pairs(session, project))

check(tc.mastering.update_cluster_results(session, project))

check(tc.mastering.publish_clusters(session, project))
pcattori commented 4 years ago

Existing docs: https://docs.tamr.com/tamr-tutorials/docs/overview-mastering