Eventual-Inc / Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
1.79k stars 108 forks source link

Dynamically adding rows? #943

Closed shyamsn97 closed 11 months ago

shyamsn97 commented 1 year ago

Is your feature request related to a problem? Please describe. Hey! First off just wanna say this is an awesome project! Its super useful for my ml workflows. However, for some usecases I want to ingest and add new data (rows) while also using daft to query + apply functions on a combination of the old data + the new data. Not sure if I missed it in the docs, but is that usage available right now in daft? Thanks!

jaychia commented 1 year ago

Thanks @shyamsn97! Keep letting us know how we can make Daft more useful for you :)

We don't have this function right now, but I think we will want to add an API to do a "row-wise join".

df1.concat(df2)

Under the hood we actually already have a lot of this functionality built out on our underlying Table datastructure, but we need to expose this on the DataFrame level.

Would this help your use-case here? Something like:

df = daft.read_csv(...)
additional_data = daft.from_pydict({"foo": [1, 2, 3]})
df = df.concat(additional_data)

The tricky bit here is when concatting, we'll need to make sure that the schemas of both DataFrames match up, or perform some kind of schema resolution (essentially, casting the schema of the incoming dataframe to the source of truth dataframe, or throwing an error if this cannot be achieved).

Let me know what you think!

shyamsn97 commented 1 year ago

This would be perfect! Exactly what I’m looking for

samster25 commented 1 year ago

Hi @shyamsn97, We'll bump up the priority on that feature and would love your feedback once it's out!

On an unrelated note, It's good to hear from you! It's been a long time!

shyamsn97 commented 1 year ago

Thanks so much, looking forward to it! And likewise, great to see you’re doing awesome stuff 🙂

jaychia commented 11 months ago

Implemented in: https://github.com/Eventual-Inc/Daft/pull/1023