Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

TC: Attribute mappings #479

Closed skalish closed 3 years ago

skalish commented 3 years ago

💬 RFC

Attribute mappings are descriptions of how attributes of source datasets are connected to attributes of a project’s unified dataset. A single mapping links one source attribute to one unified attribute. Currently, there are three versioned API endpoints that need to be added to tamr_client:

An Attribute Mapping, as returned in the body of the response to these calls, is defined as a JSON dict:

{
    "id": "unify://unified-data/v1/projects/1/attributeMappings/421-396",
    "relativeId": "projects/1/attributeMappings/421-396",
    "inputAttributeId": "unify://unified-data/v1/datasets/29/attributes/city",
    "relativeInputAttributeId": "datasets/29/attributes/city",
    "inputDatasetName": "Quarterly_Spend.csv",
    "inputAttributeName": "city",
    "unifiedAttributeId": "unify://unified-data/v1/datasets/30/attributes/city",
    "relativeUnifiedAttributeId": "datasets/30/attributes/city",
    "unifiedDatasetName": "sm_project - Unified Dataset",
    "unifiedAttributeName": "city"
}

Because of redundancy between the id fields and attribute names, this can be summarized more concisely as

{
    "relativeId": "projects/1/attributeMappings/421-396",
    "relativeInputAttributeId": "datasets/29/attributes/city",
    "relativeUnifiedAttributeId": "datasets/30/attributes/city"
 }

💁 Possible Solution

@dataclass(frozen=True)
class AttributeMapping:
    """A Tamr Attribute Mapping.

    See https://docs.tamr.com/new/reference/retrieve-projects-mappings

    Args:
        url
        input_attribute_url
        unified_attribute_url
    """

    url: URL
    input_attribute_url: URL
    unified_attribute_url: URL

def create(
    session: Session,
    project: Project,
    *,
    input_attribute: Attribute,
    unified_attribute: Attribute,
) -> AttributeMapping:
    ...

def get_all(session: Session, project: Project) -> List[AttributeMapping]:
    ...

def delete(session: Session, attribute_mapping: AttributeMapping):
    ...

💻 Examples

import tamr_client as tc

my_project = Project(...)  # my Tamr project
my_input_attr = Attribute(...)  # an existing attribute of an input dataset of `project`
my_unified_attr = Attribute(...)  # an existing attribute of the unified dataset of `project`

# Create a new attribute mapping
new_attr_mapping = tc.schema_mapping.attribute_mapping.create(my_project, input_attribute=my_input_attr, unified_attribute=my_unified_attr)

# Get all attribute mappings of my project
all_attr_mappings =  tc.schema_mapping.attribute_mapping.get_all(my_project)  # this is a list
print(new_attr_mapping in all_attr_mappings)  # my new mapping should be in there

# Delete my attribute mapping
tc.schema_mapping.attribute_mapping.delete(new_attr_mapping)

# Extension: I want to completely unmap `my_input_attr`
for mapping in tc.schema_mapping.attribute_mapping.get_all(my_project):
    if mapping.input_attribute_url == my_input_attr.url:
        tc.schema_mapping.attribute_mapping.delete(mapping)
skalish commented 3 years ago

The actual modeling of AttributeMapping raises a couple of questions. Its own URL specifies the project it belongs to, and aside from that, the useful properties are the input and unified attributes that are being mapped between. The least expensive way that these attributes can be a part of the AttributeMapping object is as URLs. Another option is to actual construct them as Attribute objects, but this would require making two API calls per mapping, which for the get_all function could be unreasonable.

skalish commented 3 years ago

Turns out that the POST /v1/projects/{project}/attributeMappings API call will create the target unified attribute if it doesn't already exist, so "bootstrapping" is built in.

github-actions[bot] commented 3 years ago

:tada: This issue has been resolved in version 1.3.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: