Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

Add support for schema mapping unified attributes #111

Closed mdonovan-tamr closed 5 years ago

mdonovan-tamr commented 5 years ago

🙋 feature request

🤔 Expected Behavior

😯 Current Behavior

💁 Possible Solution

def generate_mappings(project):
    # To create a mapping, we need the the relative IDs of the source and unified attributes
    # relativeInputAttributeId - datasets/3/attributes/ap_liability_account
    # relativeUnifiedAttributeId - datasets/5/attributes/ap_liability_account

    # We need an un-aliased dataset to generate correct attribute IDs
    unified_dataset = project.client.datasets.by_relative_id(project.unified_dataset().relative_id)

    # We will be doing lots of lookup-by-name, so build a map for rapid lookup.
    # All we need from this lookup is the relative_id, so just cache that
    targets = {
        attr.name: attr.relative_id
        for attr in unified_dataset.attributes
    }
    mappings = []
    from tamr_unify_client.models.dataset.collection import DatasetCollection
    input_datasets = DatasetCollection(project.client, api_path=project.relative_id + "/inputDatasets")
    for source_dataset in input_datasets:
        for attr in source_dataset.attributes:
            mapping = mapping_for(attr)
            if not mapping:
                continue
            mappings.append({
                "relativeInputAttributeId": attr.relative_id,
                "relativeUnifiedAttributeId": targets[mapping]
            })
    return mappings

def configure_schema_mapping(project):
    schema_mappings = generate_mappings(project)
    project.client.post(project.relative_id + "/attributeMappings", json=schema_mappings).successful()
    return None

🔦 Context

💻 Examples

nbateshaus commented 5 years ago

I want to call attention to this line:

# We need an un-aliased dataset to generate correct attribute IDs
unified_dataset = project.client.datasets.by_relative_id(project.unified_dataset().relative_id)

This should be handled behind the scenes - i.e., I should be able to map attributes in the unified dataset, even if it is aliased.

JuliaMalkin commented 5 years ago

Please flag this for me to document. To do this: