Firionus / FastRunningMedian.jl

Efficient running median for Julia
MIT License
11 stars 1 forks source link

FastRunningMedian.jl

This Julia Package allows you to calculate a running median - fast.

Getting Started

In Julia, install with:

]add FastRunningMedian

Example Usage:

julia> using FastRunningMedian

julia> running_median([1,9,2,3,-9,1], 3)
6-element Vector{Float64}:
 1.0
 2.0
 3.0
 2.0
 1.0
 1.0

High-level API

# FastRunningMedian.running_medianFunction.

running_median(input, window_length, tapering=:symmetric; kwargs...) -> output

Run a median filter of window_length over the input array and return the result.

If the input array is multidimensional, the median will be run only over the first dimension, i.e. over all columns indepedently.

Taperings

The tapering decides the behaviour at the ends of the input. The available taperings are:

If you choose an even window_length, the elements of the output array lie in the middle between the input elements on a continuous underlying axis.

With the exception of :beginning_only, all taperings are mirror symmetric with respect to the middle of the input array.

Keyword Arguments

Performance

The underlying algorithm should scale as O(N log w) with the input length N and the window_length w.

source

Taperings Visualized

Each data point is shown as a cross and the windows are visualized as colored boxes, the input is grey.

Tapering Examples

Performance Comparison

Benchmark Comparison

For large window lengths, this package performs even better than calling runmed in R, which uses the Turlach implementation written in C. For small window lengths, the Stuetzle implementation in R still outperforms this package, but the overhead from RCall doesn't seem worth it. Development of a fast implementation for small window lengths is ongoing, see the corresponding issues for details.

In contrast to this package, SortFilters.jl supports arbitrary probability levels, for example to calculate quantiles.

You can find the Notebook used to create the above graph in the benchmark folder. I ran it on an i7-2600K with 8 GB RAM while editing and browsing in the background.

Mid-level API

You can take control of allocating the output vector and median filter with a lower-level API. This is useful when you have to calculate many running medians of the same window length.

# FastRunningMedian.running_median!Function.

running_median!(mf::MedianFilter, output, input, tapering=:sym; nan=:include) -> output

Use mf to calculate the running median of input and write the result to output.

For all details, see running_median.

Examples

input = [4 5 6;
         1 0 9;
         9 8 7;
         3 1 2;]
output = similar(input, (4,3))
mf = MedianFilter(eltype(input), 3)
for j in axes(input, 2) # run median over each column
    # re-use mf in every iteration
    running_median!(mf, @view(output[:,j]), input[:,j])
end
output

# output
4×3 Matrix{Int64}:
 4  5  6
 4  5  7
 3  1  7
 3  1  2

source

Stateful API

The stateful API can be used for streaming data, e. g. to reduce RAM consumption, or building your own high-level API.

# FastRunningMedian.MedianFilterType.

MedianFilter([T=Float64,] window_length) where T <: Real

Construct a stateful running median filter, taking values of type T.

Manipulate with grow!, roll!, shrink!, reset!. Query with median, length, window_length, isfull.

Examples

julia> mf = MedianFilter(Int64, 2)
MedianFilter{Int64}(MutableBinaryHeap(), MutableBinaryHeap(), Tuple{FastRunningMedian.ValueLocation, Int64}[], 0, 0)

julia> grow!(mf, 1); median(mf) # window: [1]
1

julia> grow!(mf, 2); median(mf) # window: [1,2]
1.5

julia> roll!(mf, 3); median(mf) # window: [2,3]
2.5

julia> shrink!(mf); median(mf) # window: [3]
3

source

# FastRunningMedian.grow!Function.

grow!(mf::MedianFilter, val) -> mf

Grow mf with the new value val.

If mf would grow beyond maximum window length, an error is thrown. In this case you probably wanted to use roll!.

The new element is pushed onto the end of the circular buffer.

source

# FastRunningMedian.roll!Function.

roll!(mf::MedianFilter, val) -> mf

Roll the window over to the next position by replacing the first and oldest element in the ciruclar buffer with the new value val.

source

# FastRunningMedian.shrink!Function.

shrink!(mf::MedianFilter) -> mf

Shrinks mf by removing the first and oldest element in the circular buffer.

Will error if mf contains only one element as a MedianFilter with zero elements would not have a median.

source

# FastRunningMedian.reset!Function.

reset!(mf::MedianFilter) -> mf

Reset the median filter mf by emptying it.

source

# FastRunningMedian.medianFunction.

median(mf::MedianFilter; nan=:include)

Determine the current median in mf.

NaN Handling

By default, any NaN value in the filter will turn the result NaN.

Use the keyword argument nan = :ignore to ignore NaN values and calculate the median over the remaining values. If there are only NaNs, the median will be NaN regardless.

Implementation

If the number of elements in MedianFilter is odd, the low_heap is always one element bigger than the high_heap. The top element of the low_heap then is the median.

If the number of elements in MedianFilter is even, both heaps are the same size and the median is the mean of both top elements.

source

# Base.lengthFunction.

length(mf::MedianFilter)

Returns the number of elements in the stateful median filter mf.

This number is equal to the length of the internal circular buffer.

source

# FastRunningMedian.window_lengthFunction.

window_length(mf::MedianFilter)

Returns the window_length of the stateful median filter mf.

This number is equal to the capacity of the internal circular buffer.

source

# FastRunningMedian.isfullFunction.

isfull(mf::MedianFilter)

Returns true when the length of the stateful median filter mf equals its window length.

source

Sources

W. Hardle, W. Steiger 1995: Optimal Median Smoothing. Published in Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 44, No. 2 (1995), pp. 258-264. https://doi.org/10.2307/2986349

(I did not implement their custom double heap, but used two heaps from DataStructures.jl)

Keywords

Running Median is also known as Rolling Median or Moving Median.