Open revans2 opened 3 years ago
We need to be careful here because GpuRunningWindowFunction also has an optimization to do a scan aggregation instead of a normal window aggregation. This will have to be disabled for range based queries. The performance will not be great here, but at least it will work. We can have a follow on issue to try and understand if there is a good way to still have some of the performance wins from this.
2596 is adding in a memory optimization for rolling window operations. But that only works for row based queries. We can probably do something very similar for range based queries, but then we would still have to make sure that each batch was partitioned on the order-by columns. This should be fairly simple to do because we already have code that can partition the batches based off of columns. The main advantage to this would be that we could support large windows without needing all of the data for a given window in GPU memory at once. But for large running like windows we have performance problems compared to the CPU. https://github.com/rapidsai/cudf/issues/8440 essentially applies to range based windows, not just row based windows.
But because no customer is complaining about it yet I don't think this should be a priority at this time.