Open mcabbott opened 1 year ago
What is the use-case for this over doing something like DataLoader(data; batchsize=numobs(data))
? Is it that you don't want to get a DataLoader
returned but rather a BatchView(mapobs(f, data); batchsize=numobs(data))
?
The use is functions like this, which load data & make two DataLoader
s with the specified batch size:
Ah I see! That makes sense when creating multiple DataLoader
s 👍
You could almost use typemax(Int)
for this purpose, apart from this warning:
julia> DataLoader([1 2 3; 4 5 6]; batchsize=99, partial=false) |> collect
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
1-element Vector{Matrix{Int64}}:
[1 2 3; 4 5 6]
It would be nice if you could
DataLoader
for one maximal-size batch, without knowing the size of the inputs.This would mean that a function which loads some data, pre-processes it, and then returns a
DataLoader
could easily be used to return the full dataset, in the identical format, as long as it passes the keywordbatchsize
along.Could be
batchsize=0
, since-1
already does something special. Although unfortunately 0 is not an error right now.