auto-decimate large time-series data?

ssfrr commented 6 years ago

I'm often plotting audio data which is pretty densely sampled, so it very quickly gets to the point where I accidentally try to plot a huge number of data points and everything grinds to a halt.

I'm thinking of integrating some automatic decimation to plotting as a recipe for SampleBuf objects, but I was wondering if this would be something that should be implemented by default if the given data exceeds some reasonable threshold.

I've got the basic logic implemented below, but I don't have a great sense for the best way to integrate into Plots.jl.

"""
    chunkminmax(data, nchunks)

Break `data` into `nchunks` chunks, and return a 3-element `Tuple`
where the 1st item is a `Range` with the indices of each chunk, and the
second and third are the lower and upper bounds of each chunk, respectively.

## Example

    sig = randn(1_000_000) .* cos.(linspace(0,2π, 1_000_000))

    i, l, u = chunkminmax(sig, 1000)
    plot(i, l, fillrange=u)
"""
function chunkminmax(data, nchunks)
    N = length(data)
    nchunks < N || return 1:N, data, data
    lower = zeros(nchunks)
    upper = zeros(nchunks)

    chunksize = floor(Int, N/nchunks)
    offsetrange = 0:chunksize:(N-chunksize)
    for offset in offsetrange
        n = min(chunksize, N-offset-1)
        lower[offset ÷ chunksize+1] = minimum(view(data, (1:n) + offset))
        upper[offset ÷ chunksize+1] = maximum(view(data, (1:n) + offset))
    end

    offsetrange+1, lower, upper
end

The example creates (quickly) the following plot:

decimateplot

mkborregaard commented 6 years ago

I like this idea - there're also open issues for doing the same on image data and scatter data - there's really no reason to limit this to time series data imho? Also maybe this would live better in PlotUtils?

ssfrr commented 6 years ago

I think this particular behavior (computing the min/max of each chunk and filling between them) makes sense for time-series (or at least 1D) data. I guess the 2D extension would be surface plots where you bin into a 2D grid, make a min surface and a max surface and then fill the 3D volume between them. For image and scatter data it seems like you might want to do different things - like for a scatter you'd want a density heatmap, and for images you just want to downsample to something that's reasonable for the display, right?

mkborregaard commented 6 years ago

yes that makes sense

JuliaPlots / Plots.jl

auto-decimate large time-series data? #1597