DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.89k stars 126 forks source link

enable free-form attributes on nodes #1129

Open vograno opened 2 months ago

vograno commented 2 months ago

In some scenarios, for purposes unrelated to DAG execution, the user wants to attach free-form information to nodes. This feature request provides some desirable usage examples.

Currently, the tags mechanism could be used to attach immutable attributes to the node. Besides being immutable, the attributes must be converted to and from objects to strings, which is sub-optimal.

Free-form attributes are much like tags, but free-form. Unlike tags, Hamilton should never be tasked to interpret attributes, just to accumulate them along node expansion paths and eventually store them on a HamiltonNode.

Example 1 Common attributes for expanded nodes

a_value = {}
b_value = dict(abc=[])
common_attrs = dict(a=a_value, b=b_value)
@parameterize(
  o1=dict(x=source('o1_x_input')),
  o2=dict(x=source('o2_x_input')),
)
@attr(common_attrs)
def fun(x: dict, i: int) -> dict:
    return x

Here I expect HamiltonNodes o1.attributes and o2.attributes to be {**common_attrs} each.

Example 2 Provide distinct attributes for each expanded node

o1_attrs = dict(name='o1')
o2_attrs = dict(name='o2')
@parameterize(
  o1=dict(x=source('o1_x_input'), None=value(o1_attrs)),
  o2=dict(x=source('o2_x_input'), None=value(o2_attrs)),
)
@attr(common_attrs)
def fun(x: dict, i: int) -> dict:
    return x

Note the None key in the parameterize args. Here I expect HamiltonNodes o1.attributes == {common_attrs, o1_attrs}

Example 3

Use attributes to capture the function. I should be able to roll up my own decorator, say capture_func_name and use it like this

@capture_func_name
def fun(x: dict, i: int) -> dict:
    return x

which should be equivalent to

def fun(x: dict, i: int) -> dict:
    return x

fun = attr(dict(func=fun))(fun)
skrawcz commented 2 months ago

This seems interesting. I assume you'd want to wire it through to the HamiltonNode that's returned from driver.list_available_variables()?

@parameterize(
 o1=dict(x=source('o1_x_input'), None=value(o1_attrs)),
 o2=dict(x=source('o2_x_input'), None=value(o2_attrs)),
)

Just a quick comment. I think we'd want the above to be in an analogous decorator similar to tag_outputs. e.g.

@attr_outputs(o1=dict(o1_attrs),o2=dict(o2_attrs))
@parameterize(o1=..., o2=...)
@attr(common_attrs)
def fun(...) -> ... :
elijahbenizzy commented 2 months ago

To add to what @skrawcz said, I think this is the same thing as @tag/@tag_outputs just without the validation. Then stored under a separate place -- .attributes. Still static (known at build-time), but free-form/not for filtering/querying (other than enumeration)

vograno commented 2 months ago

This seems interesting. I assume you'd want to wire it through to the HamiltonNode that's returned from driver.list_available_variables()?

Yes