darwinrlo / public

0 stars 0 forks source link

Running median of a number stream #11

Open darwinrlo opened 4 years ago

darwinrlo commented 4 years ago

Find Median from Data Stream

darwinrlo commented 4 years ago

Insertion

Check the top element of the max-heap. If the element is less than the max-heap, then insert it into the max-heap. Otherwise, insert it into the min-heap.

If either heap gets too big ー i.e. the max-heap is bigger than the min-heap by more than one element or the min-heap is bigger than the max-heap ー remove its top element and re-insert it into the other one.

darwinrlo commented 4 years ago

Computing the median

If the max-heap is bigger than the min-heap by 1, then the median is simply the top element, which can be looked up in O(1) time.

Otherwise, if the two heaps are of equal size, then the median can be computed by looking up the top elements of both heaps and dividing by two. The two look-ups and the division are O(1) overall.

The overall running time of computing the median is O(1).

darwinrlo commented 4 years ago

Internal data structures

We maintain two collections, one for the lower half of the numbers that were inserted and one for the upper half.

Numbers in the lower half are stored in a max-heap -- a max-heap because to determine whether or not a number is in the lower half of numbers, we need to look up the greatest element of the lower half ー a max-heap provides an operation for us to do this in O(1).

Numbers in the upper half are stored in a min-heap ー though for a different reason than for the max-heap. If there are too many numbers in the upper half and we need to move a number to the lower half, the one to move is the least number. Using a min-heap, extracting the least element is merely O(log n).

For us to be able to compute the median in O(1) time, we need to keep the heap sizes roughly the same. If we have inserted an even number of numbers, then they should be the same. If we have inserted an odd number of numbers, then we will (arbitrarily) allow the max-heap to become bigger by one element.