JuliaFolds2 / ChunkSplitters.jl

Simple chunk splitters for parallel loop executions
MIT License
40 stars 5 forks source link

Working with non-AbstractArray types #21

Closed MasonProtter closed 7 months ago

MasonProtter commented 7 months ago

I know it'd be a lot of work for ChunkSplitters to systematically try to support non-AbstractArray types, but it'd be good if there was some docs guidance for other people to try and implement methods so that their types can be supported as well.

lmiq commented 7 months ago

I'm a bit reluctant in removing the signature of the input functions (AbstractArray), because that will make error messages harder to understand. But these are the minimal requirements:

julia> using ChunkSplitters

julia> struct MinimalInterface end

julia> Base.firstindex(::MinimalInterface) = 1

julia> Base.lastindex(::MinimalInterface) = 7

julia> Base.length(::MinimalInterface) = 7

julia> x = MinimalInterface()
MinimalInterface()

julia> collect(chunks(x; n=3))
3-element Vector{StepRange{Int64, Int64}}:
 1:1:3
 4:1:5
 6:1:7

julia> collect(chunks(x; n=3, split=:scatter))
3-element Vector{StepRange{Int64, Int64}}:
 1:3:7
 2:3:5
 3:3:6

What do you think? It is perfectly possible to remove the signature and document that the interface requires firstindex, lastindex, and length only. Do you think that's useful?

lmiq commented 7 months ago

(actually in terms of implementation I think the error messages would be better if they failed at those functions of the interface). I think I'lll release that right away.

lmiq commented 7 months ago

It is done in the 2.2.0 branch: https://github.com/m3g/ChunkSplitters.jl/tree/release-2.2.0

If you don't have any suggestion, I'll release it (it will make a contract relative to the interface, so if you find that something can be even further simplified, let me know).

The only downside of this is that now chunks works on scalars, which I find a bad thing, but that's just how Julia is.

@carstenbauer if you have something to suggest here, please do.

MasonProtter commented 7 months ago

Oops sorry I missed this @lmiq! I think I'm in favour of this. In particular, this is nice because it lets you take chunks of chunks!

lmiq commented 7 months ago

Pinging @carstenbauer again here, maybe he missed it to. If you all are ok, I can release that asap.

MasonProtter commented 7 months ago

One thing we could do if you want better error messages is do something like

is_chunkable(x) = false
is_chunkable(x::AbstractArray) = true
is_chunkable(x::Chunk) = true
is_chunkable(x::Tuple) = true

function chunks(x; n, split=:batch)
    if !is_chunkable(x)
        error("something informative")
    end
    ....
end

and then people can override is_chunkable for their types if they are able to support it

carstenbauer commented 7 months ago

Just took a quick look and LGTM. Apart from Mason's suggestion I only have a minor stylistic comment: I would probably rename the function argument / field name array to data or whatever because it might not be an array after all.

lmiq commented 7 months ago

Thanks! Yesterday I wasn't able to do anything. But all is fine!