ancapdev commented 6 years ago

Summary:

Unchecked construction of TimeArray objects.
Direct implementation of outer join.
Support for arbitrary types by user defined missing value.
More optimal index based assignment than generic indexing operation (applies to left, right, and outer join).

Mostly focused on optimizing outer join. Benchmark example:

using TimeSeries
using BenchmarkTools

struct SimpleTime <: Dates.TimeType
    value::Int64
end

Base.isless(x::SimpleTime, y::SimpleTime) = x.value < y.value
Base.isequal(x::SimpleTime, y::SimpleTime) = x.value == y.value
Base.:(==)(x::SimpleTime, y::SimpleTime) = x.value == y.value

t1 = [SimpleTime(x) for x in 1:2_000_000]
t2 = [SimpleTime(2 * x) for x in 1:2_000_000]
v1 = rand(Float64, length(t1), 10)
v2 = rand(Float64, length(t2), 10)
ts1 = TimeArray(t1, v1)
ts2 = TimeArray(t2, v2)
@btime merge(ts1, ts2, :outer)

Previous result:

  2.008 s (659 allocations: 1.97 GiB)

New result:

  415.149 ms (518 allocations: 503.57 MiB)

At these scales it's mostly load/store bound so any reduction in that (e.g. smaller index types, smaller data types, fewer passes) make the biggest difference. For my, and maybe other people's use cases Float32 support helps a lot. The above benchmark with Float32 values is ~224ms.

codecov-io commented 6 years ago

Codecov Report

Merging #363 into master will increase coverage by 0.61%. The diff coverage is 98.18%.

@@            Coverage Diff             @@
##           master     #363      +/-   ##
==========================================
+ Coverage   86.19%   86.81%   +0.61%     
==========================================
  Files          10       10              
  Lines         478      508      +30     
==========================================
+ Hits          412      441      +29     
- Misses         66       67       +1

Impacted Files	Coverage Δ
src/utilities.jl	`100% <100%> (ø)`	:arrow_up:
src/combine.jl	`98.73% <92.85%> (-1.27%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8dd503e...b5c13a4. Read the comment docs.

ancapdev commented 6 years ago

@iblis17 Do these look in a state you're happy to merge now?

iblislin commented 6 years ago

Thanks for your great contributions! :+1:

ancapdev commented 6 years ago

It's fun to help a little bit when so much great work is done by other people before me :smiley:.

Btw, broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx) doesn't seem to work multidimensionally, only copying the first column. dst[dstidx, :] = src[srcidx, :] works of course, and is about 3-4x slower when copying half the rows from a 10_000_000 x 10 array.

iblislin commented 6 years ago

Actually, dst[dstidx, :] = src[srcidx, :] can have a nice result, and don't benchmark against global variable (Type of global variable is unpredictable, so it isn't being optimized).

julia> f = (dst, src, srcidx, dstidx) -> @inbounds(dst[dstidx, :] = @view(src[srcidx, :]))                                                         
(::#26) (generic function with 1 method)                                                                                                           

julia> @btime f($dst, $src, $srcidx, $dstidx)                                                                                                      
  1.700 ms (5 allocations: 192 bytes)                                                                                                              

julia> @btime TimeSeries.insertbyidx!($dst, $src, $dstidx, $srcidx)                                                                                
  1.632 ms (0 allocations: 0 bytes)                                                                                                                

julia> @btime broadcast_setindex!($dst, broadcast_getindex($src, $srcidx), $dstidx)                                                                
  4.636 ms (4 allocations: 7.63 MiB)

ancapdev commented 6 years ago

Yep, aware of the globals type instability. In this case I figured it wasn't going to make much difference because it only affects the dispatch, and the functions in the benchmark operate on a fairly large dataset.

You're right though, with @inbounds and @view the performance is about the same, so this is a nicer solution.

JuliaStats / TimeSeries.jl

merge() optimization #363

Codecov Report