Open Riezebos opened 5 days ago
@Riezebos thanks for the issue. This sounds similar to another conversation @elijahbenizzy and @vograno were having about exposing bound
values...
Question on your intended user experience. To confirm, it seems you'd be happy getting this via the node object you have above?
Yes, for me that would be great!
If I try to think of a potentially better ux, disregarding how the driver and tags are currently implemented it might look something like:
node = dr.get_node("save.unique_stargazers") # or a dictionary, but a way to get a node by name without iterating over them
if node.data_saver and node.data_saver.name == "csv":
print(node.data_saver.kwargs)
But adding it to the tags that are already implemented would be a great solution in my opinion :)
OK, adding in -- I think that this makes sense. Having a non-iteration access is good -- mind adding another issue on that?
For this, I think it makes sense to add as "attributes" -- mix in with this concpt: #1129.
Then we can attach the kwargs (as you did). These will be the non-resolved kwargs (e.g. with source
in it still). We can probably also attach the same stuff at runtime with metadata-- e.g. just add a field materializer_metadata
in the materialized metadata for everything that returns all the kwargs we have.
Regarding the non-iteration access, I created another issue: #1138
Is your feature request related to a problem? Please describe. When I have a built dataflow I would like to be able to see which paths are entered in @load_from and @save_to.
Describe the solution you'd like After executing the dataflow I can see the paths in the results, but I'd like to be able to see them without executing the dataflow.
Some metadata is already being written to tags: https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/function_modifiers/adapters.py#L578
I tested adding the following line there:
Then I tried running
examples/parallelism/star_counting/run.py
with the dr.execute statement replaced by:This gives the output I was hoping for:
Describe alternatives you've considered Maybe a custom DataLoader and DataSaver that store the arguments they were initiated with?