DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.55k stars 91 forks source link

Support daft dataframes #657

Open skrawcz opened 5 months ago

skrawcz commented 5 months ago

Is your feature request related to a problem? Please describe. Hamilton doesn't have the syntactic sugar support for daft dataframes. We should add some.

Describe the solution you'd like Daft is most closest to pyspark, so we should have a dataframe to dataframe centric view, that use with_columns to apply simple UDFs.

Describe alternatives you've considered N/A

Additional context This is a nice to have feature.

skrawcz commented 5 months ago

Example notebook of what daft does -- we should be able to manage some of this with Hamilton -- https://colab.research.google.com/github/Eventual-Inc/Daft/blob/main/tutorials/mnist.ipynb#scrollTo=fc63a3ad-0e0a-4ab3-9cc0-cbec8bdd0632