Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

Attribute configuration #493

Closed skalish closed 5 months ago

skalish commented 3 years ago

↪️ Pull Request

This PR adds CRUD support for attribute configurations.

Open questions:

Possible breaking change: I noticed that attribute_mapping.get_all(...) returns a list instead of a tuple, like the other get_all functions. This PR includes a commit to change this, but that may not be too important and could constitute a breaking change.

💻 Examples

import tamr_client as tc
s = tc.Session(...)
instance = tc.Instance(...)
project = tc.Project(...)
ud = tc.dataset.unified.from_project(s, project)
attr = tc.attribute.by_resource_id(s, ud, "my_attr")

# Create
new_attr_config = tc.attribute.configuration.create(
    s,
    project,
    unified_attribute = attr,
    similarity_function=tc.attribute.configuration.SimilarityFunction.JACCARD,
    tokenizer=tc.attribute.configuration.Tokenizer.BIWORD,
    attribute_role=tc.attribute.configuration.AttributeRole.NONE,  # will be set to this by default but can be done explicitly
)

# Read
attr_configs = tc.attribute.configuration.get_all(s, project)
attr_config = tc.attribute.configuration.by_resource_id(s, project, "<config_id>")

# Update
updated_attr_config = tc.attribute.configuration.update(
    s,
    attr_config,
    similarity_function=tc.attribute.configuration.SimilarityFunction.ABSOLUTE_DIFF,
    attribute_role=tc.attribute.configuration.AttributeRole.SUM_ATTRIBUTE,
    numeric_field_resolution=[1],
)

# Delete
tc.attribute.configuration.delete(s, new_attr_config)

✔️ PR Todo