apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.39k stars 181 forks source link

Ballista: UnresolvedShuffleExec and ShuffleReaderExec should show correct partitioning scheme #16

Open andygrove opened 2 years ago

andygrove commented 2 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Once https://github.com/apache/arrow-datafusion/pull/750 is merged, UnresolvedShuffleExec and ShuffleReaderExec work correctly but they both report their output partitioning as unknown. This doesn't cause any functional issues because no further planning takes place that depends on this being correct, but this could be confusing to end users when viewing query plans. Also, in the future we may want to further optimize the plan during execution and this would require the output partitioning to be reported accurately.

Describe the solution you'd like Populate the output partitioning in UnresolvedShuffleExec and ShuffleReaderExec and implement the associated serde code.

Describe alternatives you've considered None

Additional context None

hntd187 commented 2 years ago

Can you give an example of what this might look like? I wanted to tackle it but I am unsure what output you are expecting here.