JuliaFolds2 / ChunkSplitters.jl

Simple chunk splitters for parallel loop executions
MIT License
40 stars 5 forks source link

`minchunksize` option #43

Closed carstenbauer closed 1 day ago

carstenbauer commented 1 day ago

See https://github.com/JuliaFolds2/OhMyThreads.jl/issues/114

lmiq commented 1 day ago

I implemented that option here:

https://github.com/JuliaFolds2/ChunkSplitters.jl/tree/minchunksize

unfortunately, adding one more keyword parameter is breaking the union splitting which we relied on to avoid allocations when creating the chunks.

I've tried many things, without success, and the allocation test keeps failing:

julia> using ChunkSplitters, BenchmarkTools

julia> function f(x; n=nothing, size=nothing)
           s = zero(eltype(x))
           for inds in chunks(x; n=n, size=size)
               for i in inds
                   s += x[i]
               end
           end
           return s
       end
f (generic function with 3 methods)

julia> @benchmark f($(rand(10^3)); n=4) samples=1 evals=1
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 514.000 ns (0.00% GC) to evaluate,
 with a memory estimate of 32 bytes, over 1 allocations.
carstenbauer commented 1 day ago

FYI, I created a draft PR for your branch: https://github.com/JuliaFolds2/ChunkSplitters.jl/pull/46

lmiq commented 1 day ago

FWIW:

Things I tried:

As it is, minchunksize can be nothing or an integer. I tried to just let it be an integer always, and ignore it when size was set. That did not solve the allocation.

In the current PR version I tried to add function barriers all over the place, in the original code we had more conditionals. Nothing changed.

Simpler benchmark:

julia> @benchmark chunks($(rand(10^3)); n=5) samples=1 evals=1
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 247.000 ns (0.00% GC) to evaluate,
 with a memory estimate of 32 bytes, over 1 allocations.
lmiq commented 1 day ago

Ok, seems now it is fixed. Although for some heuristic reason only.

lmiq commented 1 day ago

No, unfortunately still, it works on 1.11 but not on 1.10:

On 1.10:

julia> @benchmark chunks($(rand(10^3)); n=5) samples=1 evals=1
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 19.000 ns (0.00% GC) to evaluate,
 with a memory estimate of 32 bytes, over 1 allocations.

On 1.11:

julia> @benchmark chunks($(rand(10^3)); n=5) samples=1 evals=1
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 29.000 ns (0.00% GC) to evaluate,
 with a memory estimate of 0 bytes, over 0 allocations.
carstenbauer commented 1 day ago

See my comments in the PR

lmiq commented 1 day ago

Released in 2.6.0.