JuliaFolds2 / OhMyThreads.jl

Simple multithreading in julia
https://juliafolds2.github.io/OhMyThreads.jl/
MIT License
139 stars 9 forks source link

Greedy tasks + not enough work = reduction over empty collection #82

Closed carstenbauer closed 7 months ago

carstenbauer commented 7 months ago

Non-deterministic bug when using the greedy scheduler.

tmapreduce(sin, +, 1:N; scheduler=:greedy)

For small(ish) N, sometimes there are tasks that don't get any elements from the channel leading to a MethodError.

julia> tmapreduce(sin, +, 1:100; ntasks=3, scheduler=:greedy)
-0.12717101366042072

julia> tmapreduce(sin, +, 1:100; ntasks=3, scheduler=:greedy)
-0.12717101366042027

julia> tmapreduce(sin, +, 1:100; ntasks=3, scheduler=:greedy)
ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:352 [inlined]
 [2] fetch
   @ ./task.jl:372 [inlined]
 [3] fetch
   @ ~/.julia/packages/StableTasks/3CrzR/src/internals.jl:9 [inlined]
 [4] _mapreduce(f::typeof(fetch), op::typeof(+), ::IndexLinear, A::Vector{StableTasks.StableTask{Float64}})
   @ Base ./reduce.jl:440
 [5] _mapreduce_dim(f::Function, op::Function, ::Base._InitialValue, A::Vector{StableTasks.StableTask{Float64}}, ::Colon)
   @ Base ./reducedim.jl:365
 [6] mapreduce(f::Function, op::Function, A::Vector{StableTasks.StableTask{Float64}})
   @ Base ./reducedim.jl:357
 [7] _tmapreduce(f::Function, op::Function, Arrs::Tuple{…}, ::Type{…}, scheduler::GreedyScheduler, mapreduce_kwargs::@NamedTuple{})
   @ OhMyThreads.Implementation ~/repos/OhMyThreads.jl/src/implementation.jl:202
 [8] tmapreduce(f::Function, op::Function, Arrs::UnitRange{…}; scheduler::Symbol, outputtype::Type, init::OhMyThreads.Schedulers.NotGiven, kwargs::@Kwargs{…})
   @ OhMyThreads.Implementation ~/repos/OhMyThreads.jl/src/implementation.jl:64
 [9] top-level scope
   @ REPL[9]:1

    nested task error: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
    Stacktrace:
     [1] reduce_empty(op::Base.MappingRF{OhMyThreads.Implementation.var"#62#67"{…}, Base.BottomRF{…}}, ::Type{Tuple{…}})
       @ Base ./reduce.jl:361
     [2] reduce_empty_iter
       @ ./reduce.jl:384 [inlined]
     [3] reduce_empty_iter
       @ ./reduce.jl:383 [inlined]
     [4] foldl_impl
       @ ./reduce.jl:49 [inlined]
     [5] mapfoldl_impl
       @ ./reduce.jl:44 [inlined]
     [6] mapfoldl
       @ ./reduce.jl:175 [inlined]
     [7] mapreduce
       @ ./reduce.jl:307 [inlined]
     [8] #61
       @ ~/repos/OhMyThreads.jl/src/implementation.jl:196 [inlined]
     [9] (::OhMyThreads.Implementation.var"#63#68"{StableTasks.AtomicRef{…}, OhMyThreads.Implementation.var"#61#66"{…}})()
       @ OhMyThreads.Implementation ~/.julia/packages/StableTasks/3CrzR/src/internals.jl:67
Some type information was truncated. Use `show(err)` to see complete types.
MasonProtter commented 7 months ago

This is why providing init arguments is good. But yeah, we should figure out a good way to deal with this.

Either we should do a try/catch pattern, or we should just check if the collection is empty, and if it is, then bail out

carstenbauer commented 7 months ago

I attempted a basic fix with try/catch in #83.

or we should just check if the collection is empty, and if it is, then bail out

In this case we couldn't use mapreduce(_,_,chnl::Channel) but would have to roll out our own reduction. I don't particularly like that.

This is why providing init arguments is good.

We could force the user to provide an init. But doesn't seem ideal either, especially because this (likely) only happens for the case of little work per item and/or very few items. In this case, using GreedyScheduler is anyways questionable I guess.

carstenbauer commented 7 months ago

Fixed by https://github.com/JuliaFolds2/OhMyThreads.jl/pull/84