beacon-biosignals / OndaBatches.jl

Local and distributed batch loading for Onda datasets
MIT License
2 stars 0 forks source link

implement balanced + full coverage batching scheme #6

Open kleinschmidt opened 2 years ago

kleinschmidt commented 2 years ago

We would like to have a batching scheme that will balance batches AND ensure that a whole epoch covers the entire dataset. This is a lot more complicated than doing weighted/pseudorandom batching since you have to keep track of which labeled segments you've already handled. One strategy would be to generate all possible batch items and shuffle, but this could be a big footgun if there are MANY possible batch items. One way to cut down on the number of batches is to use partially overlapping windows of a particular duration, and have an index of label classes to windows.

This is a nice to have since it's very fiddly and requires a lot of validation to make sure it's correct and performant. For the time being the pseudorandom/online batching strategy works well enough for the kinds of models we're training.