JuliaFolds2 / ChunkSplitters.jl

Splitting collections into chunks
https://juliafolds2.github.io/ChunkSplitters.jl/
MIT License
50 stars 7 forks source link

Eumerate interface #17

Closed lmiq closed 10 months ago

lmiq commented 11 months ago

This PR changes the interface of the basic chunks iterator. In version 2 of the package, the iterator returned a tuple with the chunk index and the range of indices of the chunked array.

In this version, the chunks iterator returns only an iterator over the range of indexes, and if the user wants a the chunk indices they need to use enumerate(chunks(...)).

Advantages of this approach:

julia> function sum_parallel(f, x; nchunks=Threads.nthreads())
           t = map(chunks(x, nchunks)) do idxs
               Threads.@spawn sum(f, @view x[idxs])
           end
           return sum(fetch.(t))
       end

where the only parameter returned by chunks is idxs for each chunk.

Drawbacks

codecov[bot] commented 11 months ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (a87fb17) 96.66% compared to head (4ff52ea) 96.55%.

Files Patch % Lines
src/ChunkSplitters.jl 96.77% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #17 +/- ## ========================================== - Coverage 96.66% 96.55% -0.12% ========================================== Files 1 1 Lines 60 87 +27 ========================================== + Hits 58 84 +26 - Misses 2 3 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

lmiq commented 11 months ago

@carstenbauer

If you ever have any time to take a look at this PR, I would greatly appreciate. Basically it implements the idea of using enumerate to fetch chunk indices, and iterating over chunks just iterates over the indices of each chunk. That makes the pattern of using map cleaner, as the chunk indices, which are not necessary, are not returned by default. The same goes if one uses channels to store buffers.

There is no rush to merge this, particularly because it is breaking, and I'm not sure if it is worth the trouble. Although that in the long run this approach may be better.

lmiq commented 11 months ago

This is how the docs look like in this version: https://m3g.github.io/ChunkSplitters.jl/v3.0.0-DEV/

carstenbauer commented 11 months ago

I think I like it and will take a closer look at the end of the week

carstenbauer commented 10 months ago

Apart from my comments above I think this looks fine and is a good change.

carstenbauer commented 10 months ago

For symmetry, getchunk should probably also have n and and split keyword arguments.

lmiq commented 10 months ago

I'll merge this. I think the feature #18 can better be done separately with care.

lmiq commented 10 months ago

Actually with the keyword interface I can make this non-breaking. I will release a 2.1 release with the new interface and leave, for a while, the old interface undocumented. Then some day we release the breaking version deprecating those, probably without affecting anyone. For the packages that currently use this, I can propose PRs, there are just a few.

carstenbauer commented 9 months ago

@lmiq what's the status for the 2.1 release? I'd like to use the new interface :)

lmiq commented 9 months ago

Oh, I didn´t have the time to release it yet. We could release a breaking 3.0 easily (just release current master). For the 2.1 which would include both interfaces I need to recover the old one in a legacy file.

lmiq commented 9 months ago

I'll try to do that (the 2.1) now. If in half an hour I don´t get it right, I'll let you know.

lmiq commented 9 months ago

Released now: https://github.com/m3g/ChunkSplitters.jl/commit/ccbb61a9caf267726cc3e283e6e3e000bba91a42