apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.24k stars 1.18k forks source link

Reorganize the GroupColumn implementations into more coherent modules #13262

Open alamb opened 4 hours ago

alamb commented 4 hours ago

Is your feature request related to a problem or challenge?

While reviewing https://github.com/apache/datafusion/pull/12996 I noticed that all the implementations of GroupColumn for different types were in datafusion/physical-plan/src/aggregates/group_values/group_column.rs which is:

  1. a single large file
  2. Not obvious that it is like an implementation detail of GroupValuesColumn: https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/column.rs#L94-L93

Describe the solution you'd like

I would like to propose rearrainging the code:

  1. Rename https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/column.rs to https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/multi_column/mod.rs
  2. Move the implementations in datafusion/physical-plan/src/aggregates/group_values/group_column.rs to different modules within https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/multi_column (like primitive.rs, bytes.rs, etc)

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 4 hours ago

Once https://github.com/apache/datafusion/pull/12996 is merged, this would be a good first issue I think as it is just code movement and somewhat mechanical It would be a good way to get introduced to the code