Closed lorenzoh closed 2 years ago
One more point regarding the design of eachobs*
functions: currently, eachobs(buffered=false)
is implemented as a generator expression, i.e. (getobs(data, i) for i in 1:numobs(data))
. A problem with this is that it can only be iterated over once. Ideally, every data iterator, including those created by eachobs
should be iterable many times, to allow code like the following:
dataloader = eachobs(batchviewcollated(data))
for i in 1:nepochs
for batch in dataloader
# step
end
end
DataLoaders.jl currently also has this property and eachobs
not having it has caused some issues in the past.
Generators can be iterated over multiple times
julia> g = (i for i in 1:3)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:3)
julia> collect(g)
3-element Vector{Int64}:
1
2
3
julia> collect(g)
3-element Vector{Int64}:
1
2
3
Generators can be iterated over multiple times
Huh. Did this change in a recent Julia version? :o That's not an issue then!
I think it was always like that
This is the second part of porting functionality from DataLoaders.jl.
This one includes the parallel loaders:
GetObsParallel
BufferGetObsParallel
With this also comes the question of what the interface for buffered container views will look like. In general, the pattern is wanting to get an iterator over observations in a data container with the 4 combinations arising from buffer/no buffer and parallel/single-threaded. Can we brainstorm on a consistent interface here? First that comes to mind is having a single
eachobs
function with keyword argumentseachobs(data; buffered = false, parallel = false)
. This could then also give warnings when you passparallel = true
whenThreads.nthreads() == 1
and such.@darsnack @CarloLucibello