apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.46k stars 185 forks source link

Improve benchmark performance #339

Open andygrove opened 1 year ago

andygrove commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Google document for a discussion of ideas to improve performance:

https://docs.google.com/document/d/16xFYLCzCcRQCk6UgHried-nodVF6NV7kFVpJEFV31eY/edit?usp=sharing

Describe the solution you'd like

Describe alternatives you've considered

Additional context

mingmwang commented 1 year ago

👍

mingmwang commented 1 year ago

I'm working on partition reasoning now which could help to avoid unnecessary shuffle.

https://github.com/apache/arrow-ballista/issues/284

mingmwang commented 1 year ago

If any issues/gaps are identified in the physical planning phase, you can assign those issue to me and I can take care.

yahoNanJing commented 1 year ago

👍