dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
901 stars 256 forks source link

Implement examples with popular datasets #321

Open stsievert opened 6 years ago

stsievert commented 6 years ago

By some measure, we should be able to provide good performance on classic datasets to show that the pipes are in place to scale to larger examples.

The classic overused dataset everyone uses is MNIST. I'd rather see fashion-MNIST or EMNIST.

Good data loading utilities for all of these datasets are with PyTorch via torchvision.datasets.

stsievert commented 6 years ago

I think the Criteo example in #295 is a good example of an extremely large dataset. I see the MNIST-like as being more computational intensive and less memory intensive. MNIST is only 44MB, but takes some fairly complicated computation to learn (neural nets, auto-differentation, etc).