JuliaFolds / Transducers.jl

Efficient transducers for Julia
https://juliafolds.github.io/Transducers.jl/dev/
MIT License
432 stars 24 forks source link

Partition -> ERROR: LoadError: MethodError: no method matching +(::Int64, ::SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true}) #380

Closed masterholdy closed 4 years ago

tkf commented 4 years ago

basesize=8 is the correct way to specify number of items per task.

FYI, you can't use Transducers.Partition with reduce. But you can use Iterators.parallel:

julia> reduce(+, Iterators.partition(1:20, 8) |> Map(sum))
210

There are various reasons why you don't get performance boosts when using threading in Julia. Note that you have to optimize LangfordParallel.loop_inner first. From a quick look, the inner loop uses something like sum([loop_inner(parentValue, value, i, depth, s, n, sn) for i=0:possibilites]) and zeros(sn) to allocate arrays. These are not performance-friendly patterns even for single-threaded code. For example, the sum can be written as sum(loop_inner(parentValue, value, i, depth, s, n, sn) for i=0:possibilites) or sum(i -> loop_inner(parentValue, value, i, depth, s, n, sn), 0:possibilites) to completely avoid allocating an array.

tkf commented 4 years ago

Use basesize = 1, if each call to LangfordParallel.loop_inner takes a long time (> 100 micro seconds, say).

Also, I recommend you to read the documentation and understand how Iterators.partition work. You are using Iterators.partition (and Partition) wrongly. If the documentation of Iterators.partition is not clear, I encourage you to ask a question in http://discourse.julialang.org (or other Julia community websites https://julialang.org/community/).

tkf commented 4 years ago

As I said, you don't need partition here. I also recommend you to read the documentation and understand how the function works before using it.

You are running reduce on a vector with single element. There is nothing to be parallelized for such input.

julia> possibilites = 3
3

julia> ys = Iterators.partition(0:possibilites, 1) |> Map(range -> @show(range)) |> collect;
range = 0:0
range = 1:1
range = 2:2
range = 3:3

julia> ys
4-element Vector{UnitRange{Int64}}:
 0:0
 1:1
 2:2
 3:3

julia> map(length, ys)
4-element Vector{Int64}:
 1
 1
 1
 1

Did you try something like

ThreadsX.sum(
    idx -> loop_inner_parallel(parentValue, value, idx, depth, s, n, sn),
    0:possibilites;
    basesize = 1,
    init = 0,
)

? Please do note that this requires each call to loop_inner_parallel takes sufficiently long time (e.g., > 100 micro seconds).