Cross validation layer - Githubissues

Yeah! Cross-validation is an excellent next step. When working on #22, I was trying to get a rough lay-of-the-land and didn't want to overcomplicate the PR. Toy CV benchmarks like MNIST and the CIFARs pre-split test and train, so I opted not to add scope creep.

I was hoping that all of the partitionings would operate on Vector Ints and passed into Dataloaders. The idea was that, given a Dataset, someone could write a function:

splits
  :: Vector Int     -- ^ dataset's index
  -> testspec       -- ^ TBD
  -> trainspec      -- ^ TBD
  -> (Vector Int, Vector Int)  -- ^ a test and train split of the indexes

And then these Vector Int splits could be passed into a Dataloader's shuffle field, which just uses Data.Vector.backpermute under the hood (here).

I didn't have time to follow up on this, but I was also thinking that it might be nice to refactor Datasets to have a unified streaming API and only have the Dataloader handle transforms and shuffling (which might change the API a smidge).

DataHaskell / dh-core

Cross validation layer #42