DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.88k stars 126 forks source link

Allow multiple @save_to decorators on the same function #1147

Closed Riezebos closed 2 months ago

Riezebos commented 2 months ago

Is your feature request related to a problem? Please describe. I can add multiple @load_from decorators to the same function, it would make sense to me to also be able to add multiple @save_to decorators. For example, if I want to save the output of the function to both a data lake and also to a data warehouse.

Right now, if I try this, I get the following error:

InvalidDecoratorException: No saver class found for type: typing.Dict[str, typing.Any] specified by output type: typing.Dict[str, typing.Any] in node: save.clean_poster_history generated by function: clean_poster_history.

Describe the solution you'd like Multiple @save_to decorators on the same function save the output of the function multiple times.

Describe alternatives you've considered I now create functions which depend on the one I want to save like this.

@save_to.bigquery(table_name="poster_history")
def poster_history_bigquery(poster_history: ibis.Table) -> ibis.Table:
    return poster_history
elijahbenizzy commented 2 months ago

Hey! Good catch. This is actually supported -- you'll have to use the target_ parameter https://hamilton.dagworks.io/en/latest/reference/decorators/save_to/.

A little unclear as to why, but it's a valid way of doing it!

Riezebos commented 2 months ago

Ah ok, thanks! By adding both target_ and outputname the error went away!

elijahbenizzy commented 2 months ago

Ah ok, thanks! By adding both target_ and outputname the error went away!

Great! Let us know if there's a way to improve docs to make it clearer :)

Riezebos commented 2 months ago

Can't think of any way I think I just overlooked it. It could help to show a message saying that these arguments could help in cases where these errors occur and the node that causes the error starts with save.