JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
353 stars 69 forks source link

merge() optimization #363

Closed ancapdev closed 6 years ago

ancapdev commented 6 years ago

Summary:

Mostly focused on optimizing outer join. Benchmark example:

using TimeSeries
using BenchmarkTools

struct SimpleTime <: Dates.TimeType
    value::Int64
end

Base.isless(x::SimpleTime, y::SimpleTime) = x.value < y.value
Base.isequal(x::SimpleTime, y::SimpleTime) = x.value == y.value
Base.:(==)(x::SimpleTime, y::SimpleTime) = x.value == y.value

t1 = [SimpleTime(x) for x in 1:2_000_000]
t2 = [SimpleTime(2 * x) for x in 1:2_000_000]
v1 = rand(Float64, length(t1), 10)
v2 = rand(Float64, length(t2), 10)
ts1 = TimeArray(t1, v1)
ts2 = TimeArray(t2, v2)
@btime merge(ts1, ts2, :outer)

Previous result:

  2.008 s (659 allocations: 1.97 GiB)

New result:

  415.149 ms (518 allocations: 503.57 MiB)

At these scales it's mostly load/store bound so any reduction in that (e.g. smaller index types, smaller data types, fewer passes) make the biggest difference. For my, and maybe other people's use cases Float32 support helps a lot. The above benchmark with Float32 values is ~224ms.

codecov-io commented 6 years ago

Codecov Report

Merging #363 into master will increase coverage by 0.61%. The diff coverage is 98.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #363      +/-   ##
==========================================
+ Coverage   86.19%   86.81%   +0.61%     
==========================================
  Files          10       10              
  Lines         478      508      +30     
==========================================
+ Hits          412      441      +29     
- Misses         66       67       +1
Impacted Files Coverage Δ
src/utilities.jl 100% <100%> (ø) :arrow_up:
src/combine.jl 98.73% <92.85%> (-1.27%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8dd503e...b5c13a4. Read the comment docs.

ancapdev commented 6 years ago

@iblis17 Do these look in a state you're happy to merge now?

iblislin commented 6 years ago

Thanks for your great contributions! :+1:

ancapdev commented 6 years ago

It's fun to help a little bit when so much great work is done by other people before me :smiley:.

Btw, broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx) doesn't seem to work multidimensionally, only copying the first column. dst[dstidx, :] = src[srcidx, :] works of course, and is about 3-4x slower when copying half the rows from a 10_000_000 x 10 array.

iblislin commented 6 years ago

Actually, dst[dstidx, :] = src[srcidx, :] can have a nice result, and don't benchmark against global variable (Type of global variable is unpredictable, so it isn't being optimized).

julia> f = (dst, src, srcidx, dstidx) -> @inbounds(dst[dstidx, :] = @view(src[srcidx, :]))                                                         
(::#26) (generic function with 1 method)                                                                                                           

julia> @btime f($dst, $src, $srcidx, $dstidx)                                                                                                      
  1.700 ms (5 allocations: 192 bytes)                                                                                                              

julia> @btime TimeSeries.insertbyidx!($dst, $src, $dstidx, $srcidx)                                                                                
  1.632 ms (0 allocations: 0 bytes)                                                                                                                

julia> @btime broadcast_setindex!($dst, broadcast_getindex($src, $srcidx), $dstidx)                                                                
  4.636 ms (4 allocations: 7.63 MiB)
ancapdev commented 6 years ago

Yep, aware of the globals type instability. In this case I figured it wasn't going to make much difference because it only affects the dispatch, and the functions in the benchmark operate on a fairly large dataset.

You're right though, with @inbounds and @view the performance is about the same, so this is a nicer solution.