apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.31k stars 1.19k forks source link

[EPIC] Improving cost calculations and cost based optimizations #3929

Open isidentical opened 2 years ago

isidentical commented 2 years ago

Design document: https://docs.google.com/document/d/1M4mmV7KA1LSj-D-WJA338B4ydlm-8A8D5OPuDE5_SD4/edit

This is a meta issue for improving cost calculations and cost-based optimizations in DataFusion. We already have some statistics collected (mainly from the table sources) and there are estimations for statistics by some of the execution plan nodes, and the overall idea is to improve these as well as possible CBOs.

Main Goals

Work in Progress

Planned

Future

P.S.: feel free to update the text directly or let me know (and I can update it myself)

isidentical commented 2 years ago

@alamb @Dandandan @mingmwang I've created the meta/epic issue as we discussed

alamb commented 2 years ago

I believe the next step is some sort of design document.

isidentical commented 2 years ago

I'd be happy to start one, and if anyone is interested I can also give write access (shoot me your google emails at isidentical@gmail.com).

Dandandan commented 2 years ago

Maybe you can share the doc publicly so anyone can do suggestions?

isidentical commented 2 years ago

It should be publicly accessible now: https://docs.google.com/document/d/1M4mmV7KA1LSj-D-WJA338B4ydlm-8A8D5OPuDE5_SD4/ (also pinning this to the issue)

It is an overall discovery of the stuff we are doing right now and how they can actually help us in the future (as well as some possible points) but it is in a very early stage. I'd be thrilled to hear about what you are thinking as well as potentially other unexplored areas).

alamb commented 2 years ago

I plan to review the doc carefully tomorrow ❤️

isidentical commented 2 years ago

Thanks @alamb! I'll also try to talk a bit more about it with real-world examples in tomorrow's meeting from scratch (if we would have the time for that in this meetup, and if I can actually make it there), just in case if anyone else here is planning to attend.