Improved performance of RANGE preceding window functions

alamb commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. I am not sure this is an important feature to add, but I wanted to write it down.

Basically there is an interesting algorithm called Segment Tree which is described in :

Efficient processing of window functions in analytical SQL queries by Viktor Leis, Kan Kundhikanjana, Alfons Kemper, Thomas Neumann

https://dl.acm.org/doi/10.14778/2794367.2794375 p1058-leis.pdf

This algorithm handles window functions with RANGE window functions well, at least that is the claim. It might be a reasonable structure to implement instead of the "MovingMinMax" added in https://github.com/apache/arrow-datafusion/pull/4675 by @berkaycpp and @mustafasrepo .

Describe the solution you'd like

If we hit unacceptable performance of window functions (especially with largely varying RANGE), this might an algorithm worth looking into.

As a reminder a RANGE window frame is determined in terms of the values of the partition, not the number of rows:


❯ create table foo as values (1), (2), (3), (4), (5), (6), (6), (6);
0 rows in set. Query took 0.000 seconds.

❯ select column1, first_value(column1) OVER (ORDER BY column1 RANGE 3 PRECEDING) from foo;
+---------+--------------------------+
| column1 | FIRST_VALUE(foo.column1) |
+---------+--------------------------+
| 1       | 1                        |
| 2       | 1                        |
| 3       | 1                        |
| 4       | 1                        |
| 5       | 2                        |
| 6       | 3                        |
| 6       | 3                        |
| 6       | 3                        |
+---------+--------------------------+

❯ select column1, first_value(column1) OVER (ROWS 3 PRECEDING) from foo;
+---------+--------------------------+
| column1 | FIRST_VALUE(foo.column1) |
+---------+--------------------------+
| 1       | 1                        |
| 2       | 1                        |
| 3       | 1                        |
| 4       | 1                        |
| 5       | 2                        |
| 6       | 3                        |
| 6       | 4                        |
| 6       | 5                        |
+---------+--------------------------+
8 rows in set. Query took 0.001 seconds.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

alamb commented 1 year ago

cc @ozankabak I thought you might be interested in this (not to implement but as some context / possible future work)

ozankabak commented 1 year ago

Table 1 in this paper gives a good summary of complexities involved. Datafusion currently employs what the author calls "Removable Cumulative" mechanism. However, the min cases in Datafusion is actually O(n), not O(n logn) as written in the paper, due to the MovingMinMax algorithm (which is why we thought it was good fit for this use case).

So, for the first three frame types in Table 1, what we have is better complexity-wise. Certain implementation improvements can still be made (e.g. in binary searching) obviously, but I'd say we are doing a good job there.

If I'm not mistaken we don't support the fourth frame type; i.e. frames with variable boundaries. This is the use case where the segment tree shines. When we add support for that, we should definitely consider segment tree as the algorithm of choice. To summarize, I think the segment tree approach and our current approach will coexist and just apply to different use cases, it seems like.

It was a good read, thank you for putting this on the radar.

alamb commented 1 year ago

If I'm not mistaken we don't support the fourth frame type;

This is what I thought too at first -- and then I tried it. I believe they are referring to window functions like RANGE 3 PRECEDING which DataFusion does support 🤯

However, after more reading, I think they are referring to something like an expression rather than constant in a PRECEDING clause which DataFusion does not support

❯ select column1, first_value(column1) OVER (ROWS column1/5 PRECEDING) from foo;
Internal("Window frame bound cannot be BinaryOp { left: Identifier(Ident { value: \"column1\", quote_style: None }), op: Divide, right: Value(Number(\"5\", false)) }")

apache / datafusion

Improved performance of RANGE preceding window functions #4904