Open vograno opened 2 months ago
To clarify, you wouldn't expect the following to work normally though right?
# this should not work
result = driver.execute(final_vars=['output_node'], inputs={"x": {"some": "value"}})
But I think I see where you're coming from now I think. The tricky part is how would you handle this:
x_value = {}
y_value = {}
@parameterize(
output_node=dict(x=value(x_value),
output_node2=dict(x=value(y_value))
)
def fun(x: dict) -> dict:
return x
Or is the case that all of them have a the same default value?
In the interim you could do something functionally equivalent like this:
@tag(marker="default") # could identify this as such
def x() -> dict:
return {} # or reference some variable like you have above in the module
@tag(marker="parameterized")
@parameterize(output_node=dict(x=source("x")))
def fun(x: dict) -> dict:
return x
then:
dr.list_available_variables({"TAG_NAME": "TAG_VALUE"})
dr.list_available_variables({"TAG_NAME": "TAG_VALUE"})
overrides=
via .execute()...To clarify, you wouldn't expect the following to work normally though right?
# this should not work result = driver.execute(final_vars=['output_node'], inputs={"x": {"some": "value"}})
This shouldn't work indeed.
But I think I see where you're coming from now I think. The tricky part is how would you handle this:
x_value = {} y_value = {} @parameterize( output_node=dict(x=value(x_value), output_node2=dict(x=value(y_value)) ) def fun(x: dict) -> dict: return x
Or is the case that all of them have a the same default value?
How do I handle it or how to implement this? In terms of handling, this is precisely what I need: I have two nodes, output_node and output_node2, they return distinct values, and I can access those values via respective node objects.
Anyway, the default values are always different, i.e. x_value
and y_value
are distinct objects. This is in fact the whole point of it.
If it helps, one can add node.bound_parameter_values
property and store, or compute on the fly, parameterize-bound values there instead of on default_parameter_values
.
By compute on the fly I, for example, mean the case where the callable, which is already stored on the node object, is a partial object and you can pull the info from there.
In fact, if this is indeed the case and the callable is surfaced, I should be able to pull this info myself, at least in the interim.
So if you use driver.graph.nodes
you're using internal APIs... but yes this works:
from hamilton.function_modifiers import parameterize, value
x_value = {"b": 2}
y_value = {"a": 1}
@parameterize(
output_node=dict(x=value(x_value)),
output_node2=dict(x=value(y_value))
)
def fun(x: dict, i: int) -> dict:
return x
if __name__ == "__main__":
from hamilton import driver
import __main__ as main
dr = driver.Builder().with_modules(main).build()
# this is using internal APIs -- but yes you have access to the bound variables
list(dr.graph.nodes.values())[0].callable.keywords
list(dr.graph.nodes.values())[1].callable.keywords
However, the more "blessed path", would be to go via:
vars = dr.list_available_variables()
However it seems that there's a π because the partial function isn't exposed in originating_functions
-- looking at the code there's some slight inconsistencies with how expander decorators set the originating functions field.
So four paths to get what you need:
func
field to HamiltonNode, or figure out the right way to add the partial function to originating_functions
field -- then you'd inspect that callable. CC @elijahbenizzy . source
.With https://github.com/DAGWorks-Inc/hamilton/pull/1125 then this would work:
vars[1].originating_functions[1].keywords
should have the partial function and what is is bound to.
OK, so it seems like the requireent is that we want to know what the right way to expose this value. I think this clarifies that how we're doing it is not perfect IMO -- the concept of a "bound" variable is nice, and should be wired through.
We often do a ton of function-algebra (callables with other callables), but really in the case of parameterize/inject (and mabye even default vals), having a bound kwarg concept would actually simplify code.
I'm uncomfortable with layering the keywords onto the function and inspecting that -- it's too much internals. Unless we make everything a partial. But the partial application could be done within the node's callable with the originating function (in most cases). That said, if each node had:
(1) a function (2) a concept of bound variables
We could pretty easily expose this all with a bit of refactoring -- can take a look soon. My take from slack is that this is not an immediate blocker, more ergonomics, but let us know how it fits into your workflow!
Thank you guys. I appreciate your receptiveness and responsiveness. Take your time.
A couple of notes just in case
inspect
module approach, the function signature in particular.Thank you guys. I appreciate your receptiveness and responsiveness. Take your time.
A couple of notes just in case
- For the function, consider leveraging the
inspect
module approach, the function signature in particular.- For the bound variables, I assume both sources and values bindings will be provided.
Yep! We use inspect
heavily in implementing Hamilton :) Quite a useful module (and Hamilton would be impossible without it). Re: bound variables -- I think it's fundamentally different. The sources
bindings will actually be edges in the graph -- thus we wouldn't need to expose it (it's available now if you just ask for the dependencies...).
But I wanted to know the name of the function parameter which the source binds to. This doesn't affect DAG execution, but it does affect the function-algebra stuff you mentioned earlier.
Here is a specific example
@parameterize(
o1=dict(x=source('fun_x_input')),
)
def fun(x: dict, i: int) -> dict:
return x
Assume that 'fun_x_input' is an input node and it feeds into fun
only. However, the value of fun_x_input
is not free, rather it depends on the exact function and exact argument of the function it feeds into. I wanted to have the ability to examine the node, figure out the function and the argument and compute the implied input value for fun_x_input
. Then I'll execute the dag with fun_x_input
set to the computed value. In general, the implied value for fun_x_input
depends on the DAG topology, including the node function names and signatures, and some user-provide input.
Here is a specific example
@parameterize( o1=dict(x=source('fun_x_input')), ) def fun(x: dict, i: int) -> dict: return x
Assume that 'fun_x_input' is an input node and it feeds into
fun
only. However, the value offun_x_input
is not free, rather it depends on the exact function and exact argument of the function it feeds into. I wanted to have the ability to examine the node, figure out the function and the argument and compute the implied input value forfun_x_input
. Then I'll execute the dag withfun_x_input
set to the computed value. In general, the implied value forfun_x_input
depends on the DAG topology, including the node function names and signatures, and some user-provide input.
Hmm, interesting... So the algebra of what happens under the hood is:
(1) fun
is created with input fun_x_input
-- this is wired through to the original parameter of x
(2) as fun_x_input
does not exist in the DAG, a node named fun_x_input
gets created
So in the current framework, it's a bit hidden how exactly it wires to the underlying function. In fact, the only contract is that it will get passed in some way, not how it will get passed/that you can access it.
But to get back to your point -- if you know that fun_x_input
is a dependency of
fun`, you should be able to do what you want?
all_vars = dr.list_available_variables()
fun_var, = [var for var in all_vars if var.name == "fun"]
deps = var.required_dependencies | var.optional_dependencies
user_inputs = [var for var in all_vars if var.is_external_input and var.name in deps]
# this should be the information you need, in the `source` case.
Note this doesn't get the value
case, but it does allow you to inspect the source. The fact that it is bound with @parameterize
is more an implementation detail (as it currently stands).
Are you saying there will be a var
named 'fun' even though the generated node name is 'o1'?
un' even though the generate
Ahh, yes, sorry, didn't see that it was called o1
, good point!
It should look like this:
all_vars = dr.list_available_variables()
o1_var, = [var for var in all_vars if var.name == "o1"]
deps = var.required_dependencies | var.optional_dependencies
user_inputs = [var for var in all_vars if var.is_external_input and var.name in deps]
# this should be the information you need, in the `source` case.
The fact that the function is named fun
is lost in translation here -- it's not part of the contract.
Curious -- does this solve your problem? Or do you need further inspection into the way it was defined...
UPDATE: let me run your example to see what info I get.
I still need to know the function name and signature and the exact argument it is wired too. One can argue that this info is encoded in the source name, but arguably this is a hack.
Compare it to wrappers in python, https://docs.python.org/3/library/functools.html#functools.update_wrapper. It does provide access to the wrapped function via __wrapped__
attribute.
One can argue that a parameterized node is a wrapper around the function. If this information is too "internal" to make its way to HamiltonNode it can be stored on the expanded node.
python, https://docs.python.org/3/library/functools.html#functools.update_wrapper. It does provide access to the wrap
So I'm not 100% sure that these are analagous to standard decorators. The decorators actually all set an attribute on the function that is compiled later -- they don't modify the function at all. But the Hamilton Graph does parse it.
Some details here on how it is designed (for better or worse):
By the time it gets to the Hamilton graph, it has gone through multiple transformations. There's a lot of stuff we do based on the different decorators -- you can see the general process here. This is the function (or series of functions, there are cases in which we may have multiple) that get called eventually. See this field. So I think that does give you access to the originating function. But it's not as straightforward a mapping as a simple decorator/modifying wrapper --quite a few other transformations can occur. This also doesn't have the bindings (as we discussed) -- those occur in the creation of the callable (used by the internal Node
), which is (currently) a black box. E.G. the decorators outline the contract (what values will be fixed to that function, what nodes will be attached), but doesn't give you visibility one step further. This appears to be what you want?
Otherwise, I'd be really curious as to how you're using the data. Most designs are built around a core set of assumptions that help us decide what should be internal/publicly exposed, and it's always interesting to know how those might not be true for everyone! So if you have an explanation of how you might use this/what consumes this, that would be very helpful π And curious if the code I showed you works (it should get at what you're attempting, I think...).
It should look like this:
all_vars = dr.list_available_variables() o1_var, = [var for var in all_vars if var.name == "o1"] deps = var.required_dependencies | var.optional_dependencies user_inputs = [var for var in all_vars if var.is_external_input and var.name in deps] # this should be the information you need, in the `source` case.
The fact that the function is named
fun
is lost in translation here -- it's not part of the contract.Curious -- does this solve your problem? Or do you need further inspection into the way it was defined...
I still need the function. What the code snippet gives me is the topology of the graph. The function is a property of the node. I might take different decisions depending on whether it's sin
or log
even though signature-wise they are identical.
Let me suggest an alternative approach. How about allowing users to attach free-form attributes to nodes. Much like tags, but free-form. Unlike tags, Hamilton should never be tasked to interpret attributes, just to accumulate them along node expansion paths and eventually store them on a HamiltonNode.
Example 1 Common attributes for expanded nodes
a_value = {}
b_value = dict(abc=[])
common_attrs = dict(a=a_value, b=b_value)
@parameterize(
o1=dict(x=source('o1_x_input')),
o2=dict(x=source('o2_x_input')),
)
@attr(common_attrs)
def fun(x: dict, i: int) -> dict:
return x
Here I expect HamiltonNodes o1.attributes and o2.attributes to be {**common_attrs} each.
Example 2 Provide distinct attributes for each expanded node
o1_attrs = dict(name='o1')
o2_attrs = dict(name='o2')
@parameterize(
o1=dict(x=source('o1_x_input'), None=value(o1_attrs)),
o2=dict(x=source('o2_x_input'), None=value(o2_attrs)),
)
@attr(common_attrs)
def fun(x: dict, i: int) -> dict:
return x
Note the None key in the parameterize args. Here I expect HamiltonNodes o1.attributes == {common_attrs, o1_attrs}
Example 3
Use attributes to capture the function.
I should be able to roll up my own decorator, say capture_func_name
and use it like this
@capture_func_name
def fun(x: dict, i: int) -> dict:
return x
which should be equivalent to
def fun(x: dict, i: int) -> dict:
return x
fun = attr(dict(func=fun))(fun)
It should look like this:
all_vars = dr.list_available_variables() o1_var, = [var for var in all_vars if var.name == "o1"] deps = var.required_dependencies | var.optional_dependencies user_inputs = [var for var in all_vars if var.is_external_input and var.name in deps] # this should be the information you need, in the `source` case.
The fact that the function is named
fun
is lost in translation here -- it's not part of the contract. Curious -- does this solve your problem? Or do you need further inspection into the way it was defined...I still need the function. What the code snippet gives me is the topology of the graph. The function is a property of the node. I might take different decisions depending on whether it's
sin
orlog
even though signature-wise they are identical.Let me suggest an alternative approach. How about allowing users to attach free-form attributes to nodes. Much like tags, but free-form. Unlike tags, Hamilton should never be tasked to interpret attributes, just to accumulate them along node expansion paths and eventually store them on a HamiltonNode.
Example 1 Common attributes for expanded nodes
a_value = {} b_value = dict(abc=[]) common_attrs = dict(a=a_value, b=b_value) @parameterize( o1=dict(x=source('o1_x_input')), o2=dict(x=source('o2_x_input')), ) @attr(common_attrs) def fun(x: dict, i: int) -> dict: return x
Here I expect HamiltonNodes o1.attributes and o2.attributes to be {**common_attrs} each.
Example 2 Provide distinct attributes for each expanded node
o1_attrs = dict(name='o1') o2_attrs = dict(name='o2') @parameterize( o1=dict(x=source('o1_x_input'), None=value(o1_attrs)), o2=dict(x=source('o2_x_input'), None=value(o2_attrs)), ) @attr(common_attrs) def fun(x: dict, i: int) -> dict: return x
Note the None key in the parameterize args. Here I expect HamiltonNodes o1.attributes == {common_attrs, o1_attrs}
Example 3
Use attributes to capture the function. I should be able to roll up my own decorator, say
capture_func_name
and use it like this@capture_func_name def fun(x: dict, i: int) -> dict: return x
which should be equivalent to
def fun(x: dict, i: int) -> dict: return x fun = attr(dict(func=fun))(fun)
So yeah, don't know your use-case/what you need -- my general skepticism is that the contract should be at the level of the graph, not the function.
E.G. if you have a node for sin
or log
they should be able to capture it in the node name. And maybe use configuration to toggle between. At a high-level, I'm really curious about the data you want to store in the function itself, rather than the node -- to me that indicates that there is something missing in the graph/node abstraction, but I don't have the full context to quite place it...
That said, attr
s seems reasonable, and not too far from what we have, actually. Hold off, for a second, on the intended use of tags (which yes, are for querying/processing). If you just JSON-dump your params, then you get around the types. You can then use tag_outputs to have different tags per node produced.
Now, the really cool thing is that you can define whatever higher-level decorator you want. E.G. you can define your own parameterization that:
And you'd be using Hamilton decorators as building blocks. Customizing/wrapping decorators is highly encouraged, given that we can only support so much surface area but want everyone to be able to satisfy their use-case.
So yeah, don't know your use-case/what you need -- my general skepticism is that the contract should be at the level of the graph, not the function.
I agree. I got carried away with the idea of parameterized nodes being kind of wrappers. There is no need for this.
Using tags to store JSON is a good idea, thanks!
So yeah, don't know your use-case/what you need -- my general skepticism is that the contract should be at the level of the graph, not the function.
I agree. I got carried away with the idea of parameterized nodes being kind of wrappers. There is no need for this.
Using tags to store JSON is a good idea, thanks!
To be clear, I think there might be room for more free-form tags/attrs -- that could be a valuable issue to open up!
Is your feature request related to a problem? Please describe. When a vanilla node is created the default values of its parameters, if any, are available via
node.default_parameter_values
attribute. However, when a node is created by@parameterize
, the values passed to the decorator are not available via the attribute. In fact they are not available from the node object at all.The point I am making is that those values are essentially default values for the expanded node and should be surfaced in the same fashion. This feature would make expanded nodes equivalent to vanilla nodes in this regards.
Describe the solution you'd like Define a node
The following should hold