apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.29k stars 974 forks source link

Improve round-robin repartitioning #6047

Closed Dandandan closed 2 weeks ago

Dandandan commented 1 year ago

Which issue does this PR close?

Closes #6043

Rationale for this change

See issue

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

andygrove commented 1 year ago

I ran a quick test with q1.

master

Query 1 executed in: 2.431816676s
Query 1 executed in: 2.485038908s
Query 1 executed in: 2.228489379s

this PR

Query 1 executed in: 2.190762139s
Query 1 executed in: 2.071724741s
Query 1 executed in: 2.071821053s

:rocket:

Dandandan commented 1 year ago

@andygrove could you repeat it with the current version? PR was with an outdated.

It looks like main branch doesn't have RepartitionExec anymore, creating the target partitions is done inside ParquetExec now so I don't expect it to change running the query normally.

andygrove commented 1 year ago

master

https://github.com/apache/arrow-datafusion?branch=main#cf278704

Query 1 executed in: 2.345591833s
Query 1 executed in: 2.646317406s
Query 1 executed in: 2.605519797s
Query 1 executed in: 2.42015894s
Query 1 executed in: 2.363104105s
Query 1 executed in: 2.282647509s
Query 1 executed in: 2.483410922s
Query 1 executed in: 2.318904738s
Query 1 executed in: 2.218033255s
Query 1 executed in: 2.201664783s

this PR

Query 1 executed in: 2.792964707s
Query 1 executed in: 2.952075901s
Query 1 executed in: 2.74711636s
Query 1 executed in: 2.620084054s
Query 1 executed in: 2.698429628s
Query 1 executed in: 2.764418208s
Query 1 executed in: 2.636262451s
Query 1 executed in: 2.589391766s
Query 1 executed in: 2.374242298s
Query 1 executed in: 2.291481349s
alamb commented 1 year ago

marking as draft as CI is not passing and this PR doesn't seem in need of active review

Dandandan commented 1 year ago

Thanks @alamb I will revisit when I have time (needs to have a unit test and failing test updated).

github-actions[bot] commented 3 weeks ago

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.