apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.29k stars 973 forks source link

Add `ProgressiveEval` operator for optimize `SortPreservingMerge` #10488

Closed NGA-TRAN closed 2 weeks ago

NGA-TRAN commented 2 weeks ago

Is your feature request related to a problem or challenge?

Yes, this is a task of https://github.com/apache/datafusion/issues/10316. We will use the porting ProgressiveEval to optimize SortPreservingMerge

Describe the solution you'd like

In InfluxDB IOx, when the inputs of SortPreservingMerge are all sorted on the sort key and their data do not overlap, we replace SortPreservingMerge with ProgressiveEval which:

  1. Avoids starting all input streams at once
  2. Avoids having to compare any keys (doesn't actually do a merge)

We wrote about using this operator here: https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/

This ticket is to port ProgressiveEval from InfluxDB to DataFusion

Describe alternatives you've considered

No response

Additional context

No response

NGA-TRAN commented 2 weeks ago

I start working on this

alamb commented 2 weeks ago

I updated this ticket's description with some additional background and linked: https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/

There is a PR for adding this operator here: https://github.com/apache/datafusion/pull/10490

alamb commented 2 weeks ago

I think this ticket largely duplicates https://github.com/apache/datafusion/issues/10316 so let's use that instead