JuliaML / MLUtils.jl

Utilities and abstractions for Machine Learning tasks
MIT License
107 stars 20 forks source link

`batchsize=Inf` or something? #144

Open mcabbott opened 1 year ago

mcabbott commented 1 year ago

It would be nice if you could DataLoader for one maximal-size batch, without knowing the size of the inputs.

This would mean that a function which loads some data, pre-processes it, and then returns a DataLoader could easily be used to return the full dataset, in the identical format, as long as it passes the keyword batchsize along.

Could be batchsize=0, since -1 already does something special. Although unfortunately 0 is not an error right now.

lorenzoh commented 1 year ago

What is the use-case for this over doing something like DataLoader(data; batchsize=numobs(data))? Is it that you don't want to get a DataLoader returned but rather a BatchView(mapobs(f, data); batchsize=numobs(data))?

mcabbott commented 1 year ago

The use is functions like this, which load data & make two DataLoaders with the specified batch size:

https://github.com/FluxML/model-zoo/blob/52420da6fcadf30ae2e190fc77669fe1d255ff10/vision/conv_mnist/conv_mnist.jl#L71-L84

lorenzoh commented 1 year ago

Ah I see! That makes sense when creating multiple DataLoaders 👍

mcabbott commented 1 year ago

You could almost use typemax(Int) for this purpose, apart from this warning:

julia> DataLoader([1 2 3; 4 5 6]; batchsize=99, partial=false) |> collect
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
1-element Vector{Matrix{Int64}}:
 [1 2 3; 4 5 6]