JuliaFolds2 / ChunkSplitters.jl

Splitting collections into chunks
https://juliafolds2.github.io/ChunkSplitters.jl/
MIT License
50 stars 7 forks source link

type instability because getchunk can return nothing for zero-length iterators #36

Closed lmiq closed 4 months ago

lmiq commented 4 months ago

FWIW, I think that the failure in the inferrence of the Zip object is not really related to the chunking returning unions. This is in the current version, which always returns StepRanges:

julia> function mwe2(ichunk=2, n=5, l=10)
           cx = getchunk(collect(1:l), ichunk; n, split=:batch)
           cy = getchunk(collect(1:l), ichunk; n, split=:batch)
           return Iterators.zip(cx, cy)
       end
mwe2 (generic function with 4 methods)

julia> @code_warntype mwe2()
MethodInstance for mwe2()
  from mwe2() @ Main REPL[5]:1
Arguments
  #self#::Core.Const(mwe2)
Body::Base.Iterators.Zip
1 ─ %1 = (#self#)(2, 5, 10)::Base.Iterators.Zip
└──      return %1

(jl_N3ycxi) pkg> st
Status `/tmp/jl_N3ycxi/Project.toml`
  [ae650224] ChunkSplitters v2.4.2

The problem there is that getchunk can return nothing if the array has zero length.

Should we return an empty 0:-1 iterator instead?

lmiq commented 4 months ago

In this branch:

https://github.com/JuliaFolds2/ChunkSplitters.jl/tree/empty_range_instead_of_nothing

(which builds on top of https://github.com/JuliaFolds2/ChunkSplitters.jl/pull/37), the return type of getchunk was changed from nothing to empty iterators (0:-1 for batch and 0:1:-1 for scatter). This solves the issue above:

julia> function mwe2(ichunk=2, n=5, l=10)
           cx = getchunk(collect(1:l), ichunk; n, split=:batch)
           cy = getchunk(collect(1:l), ichunk; n, split=:batch)
           return Iterators.zip(cx, cy)
       end
mwe2 (generic function with 4 methods)

julia> mwe2()
zip(3:4, 3:4)

julia> @code_warntype mwe2()
MethodInstance for mwe2()
  from mwe2() @ Main REPL[11]:1
Arguments
  #self#::Core.Const(mwe2)
Body::Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}
1 ─ %1 = (#self#)(2, 5, 10)::Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}
└──      return %1

Tested that it doesn't break anything on OhMyThreads either:

Test Summary: | Pass  Total  Time
Aqua.jl       |   10     10  6.3s
Test Summary: | Pass  Total   Time
Basics        |  648    648  28.4s
Test Summary:        | Pass  Total  Time
ChunkSplitters.Chunk |   10     10  1.7s
Test Summary: | Pass  Total  Time
macro API     |   29     29  4.0s
Test Summary:  | Pass  Total  Time
WithTaskLocals |   10     10  0.2s
Test Summary:                    | Pass  Total  Time
chunking mode + chunksize option |   36     36  1.2s
Test Summary:    | Pass  Total  Time
top-level kwargs |   27     27  1.2s
Test Summary:     | Pass  Total  Time
empty collections |  144    144  2.0s
Test Summary: | Pass  Total  Time
regions       |    7      7  1.4s
Test Summary: | Pass  Total  Time
@barrier      |    4      4  0.3s
     Testing OhMyThreads tests passed 
lmiq commented 4 months ago

Fixed in release 2.4.4