JuliaCollections / DataStructures.jl

Julia implementation of Data structures
https://juliacollections.github.io/DataStructures.jl/latest/
MIT License
692 stars 246 forks source link

N-dimensional (pre-allocating) circular buffer #223

Open femtotrader opened 7 years ago

femtotrader commented 7 years ago

Hello,

I noticed that current implementation of circular buffer is only 1 dimension.

A generalized approach of this problem will be nice (especially for implementing streaming timeseries)

So, instead of pushing a single value in a 1D circular buffer,

I also noticed that current implementation of circular buffer is using a Vector with size increasing (until capacity is reached). Maybe pre-allocating array will be a more efficient approach ?

@tbreloff I noticed you are original author. What is your opinion about this ?

femtotrader commented 7 years ago

Edit: WIP in this PR https://github.com/JuliaLang/DataStructures.jl/pull/228

femtotrader commented 7 years ago

I'm facing an issue to display this N dimensional circular buffer but it seems to properly push data

I define a 2D circular buffer with a capacity of 3 (rows) and with 4 columns containing Int

julia> using DataStructures

julia> cb = CircularBuffer(Int, 3, 4)
0-element DataStructures.CircularBuffer{Int64,2}

I'm trying to push some data

julia> cb
0-element DataStructures.CircularBuffer{Int64,2}

julia> push!(cb, [10, 11, 12, 13])
1-element DataStructures.CircularBuffer{Int64,2}:
Error showing value of type DataStructures.CircularBuffer{Int64,2}:
ERROR: BoundsError: attempt to access (Base.OneTo(1),)
  at index [2]
 in indices at ./abstractarray.jl:57 [inlined]
 in print_matrix(::IOContext{Base.Terminals.TTYTerminal}, ::DataStructures.CircularBuffer{Int64,2}, ::String, ::String, ::String, ::String, ::String, ::String, ::Int64, ::Int64) at ./show.jl:1390
 in print_matrix(::IOContext{Base.Terminals.TTYTerminal}, ::DataStructures.CircularBuffer{Int64,2}, ::String, ::String, ::String) at ./show.jl:1379
 in #showarray#330(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::DataStructures.CircularBuffer{Int64,2}, ::Bool) at ./show.jl:1618
 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::DataStructures.CircularBuffer{Int64,2}) at ./REPL.jl:132
 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::DataStructures.CircularBuffer{Int64,2}) at ./REPL.jl:135
 in display(::DataStructures.CircularBuffer{Int64,2}) at ./multimedia.jl:143
 in print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:154
 in print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:139
 in (::Base.REPL.##22#23{Bool,Base.REPL.##33#42{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:652
 in run_interface(::Base.Terminals.TTYTerminal, ::Base.LineEdit.ModalInterface) at ./LineEdit.jl:1579
 in run_interface(::Base.Terminals.TTYTerminal, ::Base.LineEdit.ModalInterface) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in run_frontend(::Base.REPL.LineEditREPL, ::Base.REPL.REPLBackendRef) at ./REPL.jl:903
 in run_repl(::Base.REPL.LineEditREPL, ::Base.##930#931) at ./REPL.jl:188
 in _start() at ./client.jl:360
 in _start() at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?

julia> cb.buffer
3×4 Array{Int64,2}:
                  10                   11                   12                13
 7738702960759109985  6278066780391109477  7235439921860475233  2860351068860021
 8746382376806719532  8746382398115964005  8463496609385970017                 0

julia> push!(cb, [20, 21, 22, 23])
2-element DataStructures.CircularBuffer{Int64,2}:
Error showing value of type DataStructures.CircularBuffer{Int64,2}:
ERROR: BoundsError: attempt to access (Base.OneTo(2),)
(skip)

julia> cb.buffer
3×4 Array{Int64,2}:
                  10                   11                   12  13
                  20                   21                   22  23
 8746382376806719532  8746382398115964005  8463496609385970017   0

julia> push!(cb, [30, 31, 32, 33])
3-element DataStructures.CircularBuffer{Int64,2}:
Error showing value of type DataStructures.CircularBuffer{Int64,2}:
ERROR: BoundsError: attempt to access (Base.OneTo(3),)
(skip)

julia> cb.buffer
3×4 Array{Int64,2}:
 10  11  12  13
 20  21  22  23
 30  31  32  33

julia> push!(cb, [40, 41, 42, 43])
3-element DataStructures.CircularBuffer{Int64,2}:
Error showing value of type DataStructures.CircularBuffer{Int64,2}:
ERROR: BoundsError: attempt to access (Base.OneTo(3),)
(skip)

julia> cb.buffer
3×4 Array{Int64,2}:
 40  41  42  43
 20  21  22  23
 30  31  32  33

julia> push!(cb, [50, 51, 52, 53])
3-element DataStructures.CircularBuffer{Int64,2}:
Error showing value of type DataStructures.CircularBuffer{Int64,2}:
ERROR: BoundsError: attempt to access (Base.OneTo(3),)
(skip)

julia> cb.buffer
3×4 Array{Int64,2}:
 40  41  42  43
 50  51  52  53
 30  31  32  33
tbreloff commented 7 years ago

This is some of my oldest julia code... I'm sure it's horrible. Please improve/expand any way you see fit!

On Saturday, November 12, 2016, femtotrader notifications@github.com wrote:

I'm facing an issue to display this N dimensional circular buffer but it seems to properly push data

julia> using DataStructures

julia> cb = CircularBuffer(Int, 3, 4)0-element DataStructures.CircularBuffer{Int64,2}

julia> cb0-element DataStructures.CircularBuffer{Int64,2}

julia> push!(cb, [10, 11, 12, 13])1-element DataStructures.CircularBuffer{Int64,2}: Error showing value of type DataStructures.CircularBuffer{Int64,2}: ERROR: BoundsError: attempt to access (Base.OneTo(1),) at index [2] in indices at ./abstractarray.jl:57 [inlined] in print_matrix(::IOContext{Base.Terminals.TTYTerminal}, ::DataStructures.CircularBuffer{Int64,2}, ::String, ::String, ::String, ::String, ::String, ::String, ::Int64, ::Int64) at ./show.jl:1390 in print_matrix(::IOContext{Base.Terminals.TTYTerminal}, ::DataStructures.CircularBuffer{Int64,2}, ::String, ::String, ::String) at ./show.jl:1379 in #showarray#330(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::DataStructures.CircularBuffer{Int64,2}, ::Bool) at ./show.jl:1618 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::DataStructures.CircularBuffer{Int64,2}) at ./REPL.jl:132 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::DataStructures.CircularBuffer{Int64,2}) at ./REPL.jl:135 in display(::DataStructures.CircularBuffer{Int64,2}) at ./multimedia.jl:143 in print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:154 in print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:139 in (::Base.REPL.##22#23{Bool,Base.REPL.##33#42{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:652 in run_interface(::Base.Terminals.TTYTerminal, ::Base.LineEdit.ModalInterface) at ./LineEdit.jl:1579 in run_interface(::Base.Terminals.TTYTerminal, ::Base.LineEdit.ModalInterface) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? in run_frontend(::Base.REPL.LineEditREPL, ::Base.REPL.REPLBackendRef) at ./REPL.jl:903 in run_repl(::Base.REPL.LineEditREPL, ::Base.##930#931) at ./REPL.jl:188 in _start() at ./client.jl:360 in _start() at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?

julia> cb.buffer3×4 Array{Int64,2}: 10 11 12 13 7738702960759109985 6278066780391109477 7235439921860475233 2860351068860021 8746382376806719532 8746382398115964005 8463496609385970017 0

julia> push!(cb, [20, 21, 22, 23])2-element DataStructures.CircularBuffer{Int64,2}: Error showing value of type DataStructures.CircularBuffer{Int64,2}: ERROR: BoundsError: attempt to access (Base.OneTo(2),) (skip)

julia> cb.buffer3×4 Array{Int64,2}: 10 11 12 13 20 21 22 23 8746382376806719532 8746382398115964005 8463496609385970017 0

julia> push!(cb, [30, 31, 32, 33])3-element DataStructures.CircularBuffer{Int64,2}: Error showing value of type DataStructures.CircularBuffer{Int64,2}: ERROR: BoundsError: attempt to access (Base.OneTo(3),) (skip)

julia> cb.buffer3×4 Array{Int64,2}: 10 11 12 13 20 21 22 23 30 31 32 33

julia> push!(cb, [40, 41, 42, 43])3-element DataStructures.CircularBuffer{Int64,2}: Error showing value of type DataStructures.CircularBuffer{Int64,2}: ERROR: BoundsError: attempt to access (Base.OneTo(3),) (skip)

julia> cb.buffer3×4 Array{Int64,2}: 40 41 42 43 20 21 22 23 30 31 32 33

julia> push!(cb, [50, 51, 52, 53])3-element DataStructures.CircularBuffer{Int64,2}: Error showing value of type DataStructures.CircularBuffer{Int64,2}: ERROR: BoundsError: attempt to access (Base.OneTo(3),) (skip)

julia> cb.buffer3×4 Array{Int64,2}: 40 41 42 43 50 51 52 53 30 31 32 33

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/DataStructures.jl/issues/223#issuecomment-260152332, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492vj30VtQolWHblHWDM7FsA9vCwDKks5q9jtOgaJpZM4Kwiio .

femtotrader commented 7 years ago

I still have to improve my Julia knowledges... An other problem I'm facing, is that we will have to change 1D circular buffer construction...

So instead of

cb = CircularBuffer{Int}(5)

we should have to do

cb = CircularBuffer(Int, 5)

Maybe there is a workaround? Any idea? Or maybe we can live with this API change?

tbreloff commented 7 years ago

That sounds good.

On Sunday, November 13, 2016, femtotrader notifications@github.com wrote:

I still have to improve my Julia knowledges... An other problem I'm facing, is that we will have to change 1D circular buffer construction...

So instead of

cb = CircularBuffer{Int}(5)

we should have to do

cb = CircularBuffer(Int, 5)

Maybe there is a workaround? Any idea? Or maybe we can live with this API change?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/DataStructures.jl/issues/223#issuecomment-260175052, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492qfspWXB_cuMKoVFnnKL4D8Pw0xbks5q9tZjgaJpZM4Kwiio .

femtotrader commented 7 years ago

I've found an other room for improvement in _buffer_index function

    idx = cb.first + i - 1
    if idx > n
        idx - n
    else
        idx
    end

can be replaced by a modulo (mod1 in fact)

    idx = mod1(cb.first + i - 1, n)

I didn't measure if it improve performance signifiantly but anyway it shouldn't hurt!

I'd prefer to have display fixed before submitting a PR.

Any idea what is going wrong ?

Pinging also @kmsquire who have been working on this file

kmsquire commented 7 years ago

So instead of cb = CircularBuffer{Int}(5) we should have to do cb = CircularBuffer(Int, 5)

I'd prefer not to make this change, if possible. The general direction of data structures in both Base and this package has been to move toward the first construction for consistency.

If you make a WIP pull request, I can try to make specific comments/suggestions.

kmsquire commented 7 years ago

@femtotrader, I took a glance at #228, and I'm not sure I see the purpose of having an N-D circular buffer with N > 1. Is there a reason the data needs to be stored contiguously? For example, I can create a CircularBuffer of Vector{Int}s pretty easily:

julia> using DataStructures

julia> cb = CircularBuffer{Vector{Int}}(3)
0-element DataStructures.CircularBuffer{Array{Int64,1}}

julia> push!(cb, [1,2,3])
1-element DataStructures.CircularBuffer{Array{Int64,1}}:
 [1,2,3]

julia> push!(cb, [2,3,4])
2-element DataStructures.CircularBuffer{Array{Int64,1}}:
 [1,2,3]
 [2,3,4]

julia> push!(cb, [3,4,5])
3-element DataStructures.CircularBuffer{Array{Int64,1}}:
 [1,2,3]
 [2,3,4]
 [3,4,5]

julia> push!(cb, [4,5,6])
3-element DataStructures.CircularBuffer{Array{Int64,1}}:
 [2,3,4]
 [3,4,5]
 [4,5,6]
femtotrader commented 7 years ago

It's better to have data stored contiguously if you want to implement a kind of streaming TimeArray (from TimeSeries.jl). Use case https://github.com/femtotrader/TimeSeriesIO.jl/issues/5 Is there a reason the data needs to be stored uncontiguously ? We could also ask a similar question Why using Array{Int64,2} when there is Array{Array{Int64}, 1} ?

I really need some help:

kmsquire commented 7 years ago

@femtotrader, sorry, I didn't mean to come across as critical. If you have a use case that would benefit from contiguous access, that's fine--it just wasn't (and still isn't) totally obvious to me.

For example, when working with streaming TimeArrays, do you plan to treat the data as a multidimensional array and call LAPACK or BLAS functions (or perform other linear algebra-like operations) on them? If so, then sure, it probably makes sense to have a multidimensional version of a CircularBuffer.

At the same time, it also seems pretty easy to implement such a data structure inefficiently (e.g., with a lot of data copying).

Not sure I'll have time, but I'll try to take a look and make more specific suggestions.

femtotrader commented 7 years ago

I think (but I can be wrong) that passing OHLCV data to TALib.jl should be more efficient with contiguous data

ChrisRackauckas commented 7 years ago

@femtotrader, I took a glance at #228, and I'm not sure I see the purpose of having an N-D circular buffer with N > 1. Is there a reason the data needs to be stored contiguously? For example, I can create a CircularBuffer of Vector{Int}s pretty easily:

Is there a way to do this without push!ing a new value? Like a copy! at the top, and move the pointer? I'd like use the circular buffer in a non-allocating way.

kmsquire commented 7 years ago

Is there a way to do this without push!ing a new value? Like a copy! at the top, and move the pointer? I'd like use the circular buffer in a non-allocating way.

Changing the code to be pre-allocating should be easy. I'm unclear about how you want to use this, but if you already have a vector, then it would seem you might want to avoid the copy! all together, if possible (e.g., if you know you won't modify the vector again.).

sairus7 commented 7 years ago

I think pre-allocation can be added simply by implementing Base.fill!(cb, value) method for an empty cb