Closed ancapdev closed 6 years ago
Merging #363 into master will increase coverage by
0.61%
. The diff coverage is98.18%
.
@@ Coverage Diff @@
## master #363 +/- ##
==========================================
+ Coverage 86.19% 86.81% +0.61%
==========================================
Files 10 10
Lines 478 508 +30
==========================================
+ Hits 412 441 +29
- Misses 66 67 +1
Impacted Files | Coverage Δ | |
---|---|---|
src/utilities.jl | 100% <100%> (ø) |
:arrow_up: |
src/combine.jl | 98.73% <92.85%> (-1.27%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8dd503e...b5c13a4. Read the comment docs.
@iblis17 Do these look in a state you're happy to merge now?
Thanks for your great contributions! :+1:
It's fun to help a little bit when so much great work is done by other people before me :smiley:.
Btw, broadcast_setindex!(dst, broadcast_getindex(src, srcidx), dstidx)
doesn't seem to work multidimensionally, only copying the first column. dst[dstidx, :] = src[srcidx, :]
works of course, and is about 3-4x slower when copying half the rows from a 10_000_000 x 10 array.
Actually, dst[dstidx, :] = src[srcidx, :]
can have a nice result, and don't benchmark against global variable (Type of global variable is unpredictable, so it isn't being optimized).
julia> f = (dst, src, srcidx, dstidx) -> @inbounds(dst[dstidx, :] = @view(src[srcidx, :]))
(::#26) (generic function with 1 method)
julia> @btime f($dst, $src, $srcidx, $dstidx)
1.700 ms (5 allocations: 192 bytes)
julia> @btime TimeSeries.insertbyidx!($dst, $src, $dstidx, $srcidx)
1.632 ms (0 allocations: 0 bytes)
julia> @btime broadcast_setindex!($dst, broadcast_getindex($src, $srcidx), $dstidx)
4.636 ms (4 allocations: 7.63 MiB)
Yep, aware of the globals type instability. In this case I figured it wasn't going to make much difference because it only affects the dispatch, and the functions in the benchmark operate on a fairly large dataset.
You're right though, with @inbounds
and @view
the performance is about the same, so this is a nicer solution.
Summary:
Mostly focused on optimizing outer join. Benchmark example:
Previous result:
New result:
At these scales it's mostly load/store bound so any reduction in that (e.g. smaller index types, smaller data types, fewer passes) make the biggest difference. For my, and maybe other people's use cases
Float32
support helps a lot. The above benchmark withFloat32
values is ~224ms.