JuliaFolds2 / ChunkSplitters.jl

Simple chunk splitters for parallel loop executions
MIT License
40 stars 5 forks source link

Allow specification via `chunksize` (instead of `nchunks`) #18

Closed carstenbauer closed 6 months ago

carstenbauer commented 9 months ago

I've stumbled across this a couple of times already. Note that OpenMP has schedule(static, 8) which means static scheduling in terms of chunks of size 8 (AFAIR). Would be great if we could write something like the following in Julia.

using ChunkSplitters: chunks
using Base.Threads: @threads

@threads :static for idcs in chunks(1:N; size=8)
    # ...
end
carstenbauer commented 6 months ago

Wanted this again yesterday. I'll try to add this but maybe someone wants to beat me to it :wink:

lmiq commented 6 months ago

I played with it a little now. The question is what do we want when the length of the data is not a multiple of the chunksize. For example, with a data of length 7 and a chunksize of 4, we could have the following chunks:

  1. 1:7 (n = div(length(data), chunksize)
  2. (1:4, 5:7) (if the remaining is not zero, add 1 chunk)

These two options are the easiest to implement. The second one seems reasonable. However, if the same data of 7 elements is divided in chunksizes of 5, what do we want? Keep chunks of lengths 4 and 3, or have 5 and 2?

Anyway, this is my first try: https://github.com/JuliaFolds2/ChunkSplitters.jl/commit/f2f434c457a3b64f6117a921d324ba093f25e30b

Which follows option 2.

lmiq commented 6 months ago

I think option 2 is fine. The perhaps strange cases are those in which the chunksize is greater than half of the length of the data, but parallelizing under that condition in senseless anyway.

carstenbauer commented 6 months ago

Since the inspiration for this was OpenMP (i.e. schedule(static, 8)) it might make sense to check what OpenMP does in this case.

lmiq commented 6 months ago

The OpenMP specification is:

If the kind argument is static, iterations are divided into chunks of size chunk_size, and the17 chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread18 number. Each chunk contains chunk_size iterations, except for the chunk that contains the19 sequentially last iteration, which may have fewer iterations. If chunk_size is not specified, the20 logical iteration space is divided into chunks that are approximately equal in size, and at most one21 chunk is distributed to each thread.

Which means that a 7-element data would be divided into 5-2 chunk lengths if the chunk_size is set to 5. Unfortunately that is more laborious to implement, as it does not comply with an equivalent definition of the number of chunks in any way.

carstenbauer commented 6 months ago

I like what OpenMP is doing. When I specify a chunk size and look at the resulting chunks, there shouldn't be more than one that doesn't have the requested chunk size. Hence, I would expect that for a chunk size of 5, 7 elements would be split up into two chunks of size 5 and 2 respectively. (And for a chunk size of 4, I would expect two chunks of size 4 and 3 respectively.) This would also be consistent with Iterators.partition:

julia> Iterators.partition(1:7,5) |> collect
2-element Vector{UnitRange{Int64}}:
 1:5
 6:7

julia> Iterators.partition(1:7,4) |> collect
2-element Vector{UnitRange{Int64}}:
 1:4
 5:7

The way I would implement this naively is d, r = divrem(nelements, chunksize). d tells me how many full chunks I need and r tells me if (r != 0) I need an extra chunk and how many elements it will contain (r).

Unfortunately that is more laborious to implement, as it does not comply with an equivalent definition of the number of chunks in any way.

If you mean that you can't just translate chunksize into a value for the number of chunks and then use the existing algorithm, then I agree, yes. But I think we nonetheless want the OpenMP behavior.

lmiq commented 6 months ago

Yes, for that we need to split the two implementations. I started it here, but it started to become ugly, and stopped for the moment.

carstenbauer commented 6 months ago

Made a PR (#26). I left the combination size=... + split=:scatter undefined for now (throws ArgumentError) because it's not entirely clear what we would want in this case. Maybe something like this, maybe not :)

size = 2

input [1, 2, 3, 4, 5, 6, 7]

output [[1, 7], [2, 6], [3, 5], [4]]

I feel that a reasonable property to ensure is that the number and lengths of the chunks is invariant under changing split.

lmiq commented 6 months ago

I would say

[[1,4], [2,5], [3,6],[7]]

Seems more consistent to leave the remaining to the end.

carstenbauer commented 6 months ago

Fair enough, I guess. But why "split" in the middle? We could also aim for "round-robin"

[[1, 5], [2, 6], [3, 7], [4]]

There is also the extra difficulty, that we have to express this in ranges, not arrays.

lmiq commented 6 months ago

I don't know. I only see it more consistent with the fact that when using :batch the different chunk is the last one. If for any reason someone uses that in a rational intended way, feels that this one should follow the same pattern.

carstenbauer commented 6 months ago

The "different chunk" is still the last one: it has less elements (one in the example).

lmiq commented 6 months ago

Yes, true, but not the last elements. That is something that someone could use as an information. But, again, I don't think this is really important, if you think it is more elegant the other way around.

lmiq commented 6 months ago

Added a new issue related to size and :scatter here: https://github.com/JuliaFolds2/ChunkSplitters.jl/issues/28