apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.46k stars 185 forks source link

[Improvement] Increase the throughput in push-staged TaskSchedulingPolicy #748

Open Ted-Jiang opened 1 year ago

Ted-Jiang commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Our team using push-staged TaskSchedulingPolicy as the OLAP query engine, we main focus on the quick respond query less than 10s hundreds degrees of concurrency.

Now:

Cluster: with 60 executor (each 200 slots) and 1 scheduler cluster. WorkLoad: 200 clients submit 0.5s cost query in sequence.

We got 1200 query per mins 😢

So we found below issues in push-staged

After fix these we got 16000 query per mins ! We will contribute these back soon.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Ted-Jiang commented 1 year ago

@yahoNanJing @mingmwang @andygrove @thinkharderdev

thinkharderdev commented 1 year ago

Interesting, sounds like some promising improvements :)