composewell / streamly-statistics

Statistical measures for finite or infinite data streams
Apache License 2.0
6 stars 0 forks source link

Fix minimum and maximum #6

Open harendra-kumar opened 2 years ago

harendra-kumar commented 2 years ago

We can possibly use a heap instead of a Deque, hopefully giving better performance. Here is how it might work.

Assuming the input is (a, Maybe a), where the first element of the tuple is the element being inserted in the window and the second element is the one being ejected from the window. Assuming a min heap to find the minimum:

Note that we would need a custom heap implementation as we need to cut the top of the heap in one case and the bottom of the heap in another.

The cost would likely be n * log w where n is the number of elements in the stream and w is the window size. The worst case for minimum would be when the input stream is sorted ascending order. The best case would be when it is sorted in the descending order.

Note that we can perform a partial sort of the stream by scanning it using the min or the max fold.

harendra-kumar commented 2 years ago

We can possibly combine the ring and the heap into a single data structure to efficiently find the min/max in a rolling window.

harendra-kumar commented 2 years ago

For smaller window sizes we could just use the ring buffer and perform a linear search in it to find the min/max. The total cost would be n * w.