apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

[multistage][sorting] improve sorting efficiency #12228

Open walterddr opened 6 months ago

walterddr commented 6 months ago

currently we have a very basic implementation of the SortOperator and a very basic rule for cross-server sorting plan generator. It generates an sort exchange which connects a local sorting prior to exchange then follow by a remote sorting on a single machine this is originally designed for large quantity sorting and streaming responses. but none of those were implemented in SortOperator

Improvement

  1. for the current SortOperator implementation, the sort-exchange can be simplified to a normal exchange, and the local sort prior to exchange can be removed.
  2. for longer term we should support 2 strategies:
    • when data size is small, we should follow the 1st approach;
    • when data size is large, we should (a) create a local sort before the exchange, (either via sort-exchange, or via additional sort operator); (b) do a k-merge sort on the single-machine global sorting operator (c) stream the results back (which is possible with k-merge)
hpvd commented 3 weeks ago

@walterddr this looks great :-)

What do you think of how to process this issue best to keep a good overview:

having a tasklist for child issues is great. Having one we can also ad the new "parent issue label"...