DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.71k stars 109 forks source link

`PandasExcelWriter` overwrites file #946

Closed zilto closed 1 month ago

zilto commented 2 months ago

A user would like to create multiple to.excel() materializers and have each write to a separate Excel sheet. Currently, each materializer overwrite the previous file.

Expected behavior

The save_data() should use pandas's ExcelWriter instead of .to_excel() because it allows if_sheet_exists={‘error’, ‘new’, ‘replace’, ‘overlay’}.

Will have to make sure all kwargs are backwards compatible given ExcelWriter takes less arguments.

ref: https://pandas.pydata.org/docs/reference/api/pandas.ExcelWriter.html

Additional context

Add any other context about the problem here.

lohithasiripurapu commented 1 month ago

Hi, is anyone working on this issue? can you assign it to me?

skrawcz commented 1 month ago

@lohithasiripurapu sorry, but it seems like @noahridge has a start here. I just can't assign it to them until they comment on this issue.

noahridge commented 1 month ago

@skrawcz Yep, I will finish up the pull request for this issue shortly.

skrawcz commented 1 month ago

this will be pushed next release