Closed barucden closed 2 years ago
By design, the library tries to keep different operations (e.g. shuffling vs. batching) separate but composable. This makes it easy to re-order a pipeline to do exactly what you want it to.
DataLoader
exists only as a convenience for folks coming from other ML frameworks where these would be under one "dataloader" class. Right now, DataLoader
is just shuffling and batching, but eventually it will be a one-stop constructor that combines shuffling + batching + parallel loading.
BatchView
is the underlying implementation of batching, but ideally, a user should never have to directly construct one. eachobs
is the user facing function.
Right now, the library is a combination of porting code from Flux.Data and MLDataPattern. There is definitely redundancy that needs to be cleaned up.
I would be remiss not to point out that the experimental torchdata library is trying a similar approach wrt composability.
Got it! Thank you for the explanation. I am closing the issue.
As the title suggests, I am wondering why there are three ways to iterate over batches:
Looking at the implementation,
eachobs
is implemented usingBatchView
, andDataLoader
useseachobs
. So pardon my ignorant question but why not to have just one way of batch iteration providing all the features [shuffling, (partial) batching, etc.]?