JuliaFolds2 / ChunkSplitters.jl

Simple chunk splitters for parallel loop executions
MIT License
40 stars 5 forks source link

Handle chunking of empty collections properly #30

Closed carstenbauer closed 5 months ago

carstenbauer commented 6 months ago
julia> using ChunkSplitters

julia> collect(chunks(11:10; n=6))
ERROR: ArgumentError: You must either indicate the desired number of chunks (n) or the target size of a chunk (size).
Stacktrace:
  [1] missing_input_err()
    @ ChunkSplitters ~/.julia/packages/ChunkSplitters/R5lnx/src/ChunkSplitters.jl:118
  [2] getchunk(itr::UnitRange{Int64}, ichunk::Int64; n::Int64, size::Int64, split::Symbol)
    @ ChunkSplitters ~/.julia/packages/ChunkSplitters/R5lnx/src/ChunkSplitters.jl:274
  [3] getchunk
    @ ~/.julia/packages/ChunkSplitters/R5lnx/src/ChunkSplitters.jl:273 [inlined]
  [4] getchunk
    @ ~/.julia/packages/ChunkSplitters/R5lnx/src/ChunkSplitters.jl:293 [inlined]
  [5] iterate
    @ ~/.julia/packages/ChunkSplitters/R5lnx/src/ChunkSplitters.jl:142 [inlined]
  [6] iterate
    @ ~/.julia/packages/ChunkSplitters/R5lnx/src/ChunkSplitters.jl:141 [inlined]
  [7] copyto!(dest::Vector{StepRange{Int64, Int64}}, src::ChunkSplitters.Chunk{UnitRange{Int64}, ChunkSplitters.FixedCount})
    @ Base ./abstractarray.jl:943
  [8] _collect
    @ ./array.jl:765 [inlined]
  [9] collect(itr::ChunkSplitters.Chunk{UnitRange{Int64}, ChunkSplitters.FixedCount})
    @ Base ./array.jl:759
 [10] top-level scope
    @ REPL[32]:1

The error message is clearly wrong and bad.

The fundamental question is what should happen in this case. We could (1) improve the error message or (2) return an empty Vector{StepRange{Int64, Int64}} (consistent with the finite case, e.g. collect(chunks(9:10; n=6))).

Came up in https://github.com/JuliaFolds2/OhMyThreads.jl/issues/86.

lmiq commented 6 months ago

I definitely prefer returning the empty vector.

lmiq commented 6 months ago

To return the empty step range vector one must return nothing from iterate. Additionaly, it seems consistent to return nothing from getchunk. I have implemented this here:

https://github.com/JuliaFolds2/ChunkSplitters.jl/pull/31/files

One additional thing, which I found slightly inelegant, is that we need to compute the length of the iterator by:

length(c::Chunk{T,FixedSize}) where {T} = cld(length(c.itr), max(1, c.size))

because c.size can be zero and still be a valid, empty, iterator.

Thus, now, we get:

    @test getchunk(10:9, 1; n=2) === nothing
    @test getchunk(10:9, 1; size=2) === nothing
    @test collect(chunks(10:9; n=2)) == Vector{StepRange{Int,Int}}()
    @test collect(chunks(10:9; size=2)) == Vector{StepRange{Int,Int}}()
    @test collect(enumerate(chunks(10:9; n=2))) == Tuple{Int64,Vector{StepRange{Int,Int}}}[]
    @test collect(enumerate(chunks(10:9; size=2))) == Tuple{Int64,Vector{StepRange{Int,Int}}}[]

If you have any comment, let me know.

lmiq commented 5 months ago

fixed in v2.4.2