Open stsievert opened 6 years ago
I think the Criteo example in #295 is a good example of an extremely large dataset. I see the MNIST-like as being more computational intensive and less memory intensive. MNIST is only 44MB, but takes some fairly complicated computation to learn (neural nets, auto-differentation, etc).
By some measure, we should be able to provide good performance on classic datasets to show that the pipes are in place to scale to larger examples.
The classic overused dataset everyone uses is MNIST. I'd rather see fashion-MNIST or EMNIST.
Good data loading utilities for all of these datasets are with PyTorch via torchvision.datasets.