[FEA] Explore a scan/group by scan alternative for range based running windows

Is your feature request related to a problem? Please describe. https://github.com/NVIDIA/spark-rapids/issues/2708 is to make it so that a range based running window looks a lot like a row based running window. At least from the perspective of how much memory is needed to do the computation in the common case. Unfortunately, the second part of the running window optimization is for performance because large windows are very expensive to calculate. To fix this we started to be able to use a scan/group by scan on the back-end to calculate the result. But that does not work here because the order by column needs to be taken into account. The one way I can see to fix this is to do two passes over the data. The first pass will do a sort based group by aggregation with the partition by and order by columns as the keys. After that is done, then we can do a scan/group by scan, but we need to do a combining aggregation for the scan instead of the original aggregation that we would normally expect.

NVIDIA / spark-rapids

[FEA] Explore a scan/group by scan alternative for range based running windows #9300