Because we're doing dynamic padding, it makes sense to vary the batch size to use the GPU memory optimally - when the padding lengths are small, we can increase the batch size, and when they're long, we can decrease the batch size.
The simplest thing to do is to expose a method to subclasses that lets them split the data into batches after sorting. The concrete model class can them specify some heuristics for how many instances will fit in a batch based on how large they are.
A much more exciting thing to do, but also probably close to impossible, is to have the library just figure out how many instances can go in each batch, by examining the computation graph, or something. Not at all sure how to do this.
Because we're doing dynamic padding, it makes sense to vary the batch size to use the GPU memory optimally - when the padding lengths are small, we can increase the batch size, and when they're long, we can decrease the batch size.
The simplest thing to do is to expose a method to subclasses that lets them split the data into batches after sorting. The concrete model class can them specify some heuristics for how many instances will fit in a batch based on how large they are.
A much more exciting thing to do, but also probably close to impossible, is to have the library just figure out how many instances can go in each batch, by examining the computation graph, or something. Not at all sure how to do this.