improve performance for binarytrees. immutable structs and parallel reduce

Orbots commented 5 years ago

This is about 3x faster on my machine now.

KristofferC commented 5 years ago

Does any other language use distributed contributing?

Orbots commented 5 years ago

The faster runs take advantage of the multi-core architecture of a single computer. @distributed seemed the quickest way there. You can see in the tables the last 4 columns are cpu load for 4 cores. The julia runs are at 100% for 1 cpu but 0% for the other 3. So you'd want to run these with julia -p 4 ( is that using 4 cores? or 1 + 4? )

Orbots commented 5 years ago

I rolled in the Union optimization from the contribution in binarytree-fast.

SimonDanisch commented 5 years ago

I'd rather use multi threading, btw ;) I started a multi threaded & pool'ed solution, borrowing from c++ solution and managed to get roughly the same performance... I'm just a bit confused, by this:

Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted.

Since the very simple implementation of the "memory pool" in julia is simpler than what C++ with a pool library is doing :D hope they'll still accept it

Orbots commented 5 years ago

The comment about no custom pools is a bit at odds with the fast c++ versions. I'm guessing they just mean no custom pool specifically implemented for this problem. You'd probably have better luck getting your pooled solution accepted if you make a new general package for memory pools. Which I highly encourage because I'd like to use such a package :)

I reached for @distributed as multithreading is still flagged as "experimental" in the docs.

SimonDanisch commented 5 years ago

It's literally just putting the nodes into a vector :P I likely won't make a package for push!(pool, node) ^^

Orbots commented 5 years ago

I reckon that would be considered a custom pool though. A more general memory pool would manage different sized pools, dynamically resizing as needed. Probably with some macro conveniences. A julia implementation would be pretty simple and mostly just wrapping push!(pool, element). Maybe I'll get to it a some point :)

SimonDanisch commented 5 years ago

I reached for @distributed as multithreading is still flagged as "experimental" in the docs.

That's just because the interface may or may not change ;) But it works pretty nicely for such problems already

SimonDanisch commented 5 years ago

Btw, this is the implementation:

# The Computer Language Benchmarks Game
# https://salsa.debian.org/benchmarksgame-team/benchmarksgame/

# contributed by Jarret Revels and Alex Arslan
# based on an OCaml program
# *reset*
using Printf
struct Node
    left::Int
    right::Int
end
function alloc!(pool, left, right)
    push!(pool, Node(left, right))
    return length(pool)
end
function make(pool, d)
    d == 0 && return 0
    alloc!(pool, make(pool, d - 1), make(pool, d - 1))
end
check(pool, t::Node) = 1 + check(pool, t.left) + check(pool, t.right)
function check(pool, node::Int)
    node == 0 && return 1
    @inbounds return check(pool, pool[node])
end
function threads_inner(pool, d, min_depth, max_depth)
    niter = 1 << (max_depth - d + min_depth)
    c = 0
    for j = 1:niter
        c += check(pool, make(pool, d))
        empty!(pool)
    end
    @sprintf("%i\t trees of depth %i\t check: %i\n", niter, d, c)
end
function loop_depths(io, d, min_depth, max_depth)
    output = ntuple(x-> String[], Threads.nthreads())
    Threads.@threads for d in min_depth:2:max_depth
        pool = Node[]
        push!(output[Threads.threadid()], threads_inner(pool, d, min_depth, max_depth))
    end
    foreach(s->foreach(x->print(io, x), s), output)
end
function perf_binary_trees(io, N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1
    pool = Node[]
    # create and check stretch tree
    let c = check(pool, make(pool, stretch_depth))
        @printf(io, "stretch tree of depth %i\t check: %i\n", stretch_depth, c)
    end

    long_lived_tree = make(pool, max_depth)

    loop_depths(io, min_depth, min_depth, max_depth)
    @printf(io, "long lived tree of depth %i\t check: %i\n", max_depth, check(pool, long_lived_tree))

end
 perf_binary_trees(stdout, 21)

Orbots commented 5 years ago

Nice. I'm seeing about 9x better perf than the original.

ChristianKurz commented 5 years ago

@Orbots please don't use julia macros directly in a GitHub discussion. This will ping the GitHub-user with that name. Just use some code fences to prevent this e.g. @distributed.

distributed commented 5 years ago

Thanks! As github user distributed I very much appreciate this:D

Sent from my iPhone

On 19 Dec 2018, at 13:49, Christian Kurz notifications@github.com wrote:

@Orbots please don't use julia macros directly in a GitHub discussion. This will ping the GitHub-user with that name. Just use some code fences to prevent this e.g. @distributed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Orbots commented 5 years ago

Right, of course. That's funny. Github user 'time' must be regretting his choice of user names now :)

JuliaPerf / BenchmarksGame.jl

improve performance for binarytrees. immutable structs and parallel reduce #1