apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.24k stars 1.18k forks source link

Consider making the DataFrame API less verbose by imitating Polars #4890

Open oersted opened 1 year ago

oersted commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I'd say that, for instance, .select(["name"]) is significantly easier to read and write than .select(vec![col("name")]).

Describe the solution you'd like

Polars deals with this by having the following more generic signature.

pub fn select<I, S>(&self, selection: I) -> Result<DataFrame, PolarsError>
where
    I: IntoIterator<Item = S>,
    S: AsRef<str>

IntoIterator makes a lot of sense, it allows for arrays for concise syntax and Vec or any other collection whenever the list is programmatically generated.

AsRef<str> is less clear, I'm not sure how they refer to complex expressions with a string reference. Perhaps Into<Expr> might be better with a default implementation for &str.

Yes I know that there's also a select_columns(&self, columns: &[&str]) method in the current API, but it is still less flexible, somewhat redundant, and there are other places where vec! and col() somewhat pollute the syntax still.

alamb commented 1 year ago

I agree this would be a great change ❤️