apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.98k stars 1.13k forks source link

Cannot run `group by` without aggregation function due to 'Invalid aggregate expression' #12298

Open Tangeroooo opened 3 weeks ago

Tangeroooo commented 3 weeks ago

Describe the bug

Return an error like below message.

Error in response: DataFusionError("Internal error: Invalid aggregate expression 'Column(Column { relation: Some(Bare { table: \"?table?\" }), name: \"video_id\" })'.\nThis was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker")

To Reproduce

for record_batch in reader { let rb = record_batch?; record_batches.push(rb); }

let columns = reader.schema().fields().iter().map(|f| { col(f.name().clone()) }).collect::<Vec<_>>();

let group_by_columns = group_by_columnsnames .iter() .map(|name: &String| col(name)) .collect::<Vec<>>();

let ctx = SessionContext::new(); let df = ctx.read_batches(record_batches)?;

// select * from tab group by col1, col2; df.aggregate(group_by_columns, columns)?.collect().await?

Expected behavior

Run group by without aggregation function.

Additional context

No response

CrystalZhou0529 commented 2 weeks ago

Hi, I am able to verify the error you got. If I understand it correctly, your use case needs to select columns that are not included in the groupby clause and also not included in the aggregation function. Lots of query engines also do not support it. I think it might be necessary to restructure the query in this case, and here's some reference that might be helpful: link 😄