Eventual-Inc / Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
1.82k stars 113 forks source link

Syntactic sugar for nested getting in column names #1994

Open jaychia opened 3 months ago

jaychia commented 3 months ago

Is your feature request related to a problem? Please describe.

When retrieving nested columns in structs, we currently rely on the Expression.struct.get(...) accessor. However, for deeply nested structs this may get extremely verbose.

Instead, a proposed solution might be to simply use . delimiters in the column itself. For example:

df = df.with_column("nested_bar", df["foo.bar"])
samster25 commented 3 months ago

👀 @kevinzwang

kevinzwang commented 3 months ago

Yeah this is a good idea. Could be applicable to list accessors too(?). When do we want to get this done? Deriving expressions from column names is something we'll eventually get around to in selector expressions, so I'm wondering if it'll make sense to think about these two things together.

Would also want to make sure it doesn't conflict with selector expression syntax since foo.bar could also be interpreted as a regex

samster25 commented 3 months ago

@kevinzwang I think this should much simpler than the selector expressions that we talked about since col(a.b.c) will always refer to exactly 1 column. Whereas selector expressions can refer to many.

jaychia commented 1 month ago

@kevinzwang to sync with @samster25 on this issue