JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
109 stars 22 forks source link

Port parallel loaders from DataLoaders.jl #30

Closed lorenzoh closed 2 years ago

lorenzoh commented 2 years ago

This is the second part of porting functionality from DataLoaders.jl.

This one includes the parallel loaders:

With this also comes the question of what the interface for buffered container views will look like. In general, the pattern is wanting to get an iterator over observations in a data container with the 4 combinations arising from buffer/no buffer and parallel/single-threaded. Can we brainstorm on a consistent interface here? First that comes to mind is having a single eachobs function with keyword arguments eachobs(data; buffered = false, parallel = false). This could then also give warnings when you pass parallel = true when Threads.nthreads() == 1 and such.

@darsnack @CarloLucibello

lorenzoh commented 2 years ago

One more point regarding the design of eachobs* functions: currently, eachobs(buffered=false) is implemented as a generator expression, i.e. (getobs(data, i) for i in 1:numobs(data)). A problem with this is that it can only be iterated over once. Ideally, every data iterator, including those created by eachobs should be iterable many times, to allow code like the following:

dataloader = eachobs(batchviewcollated(data))

for i in 1:nepochs
    for batch in dataloader
        # step
    end
end

DataLoaders.jl currently also has this property and eachobs not having it has caused some issues in the past.

CarloLucibello commented 2 years ago

Generators can be iterated over multiple times

julia> g = (i for i in 1:3)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:3)

julia> collect(g)
3-element Vector{Int64}:
 1
 2
 3

julia> collect(g)
3-element Vector{Int64}:
 1
 2
 3
lorenzoh commented 2 years ago

Generators can be iterated over multiple times

Huh. Did this change in a recent Julia version? :o That's not an issue then!

CarloLucibello commented 2 years ago

I think it was always like that