athewsey commented 3 years ago

The current MNIST challenge downloads the source data in array format and then un-packs to folders-of-images, before reading back in.

Running the algorithm on folders-of-images is good for transferring the learning to real-world use cases, but the conversion process can be a bit confusing.

If we instead used s3://fast-ai-imageclas/mnist_png.tgz from FastAI on the AWS Open Data registry, then we could:

Potentially reduce download time / increase availability, as the dataset is already S3-hosted
Directly use their folders-of-PNGs format - no conversion needed

athewsey commented 2 years ago

As of the above PRs, this should be addressed in TensorFlow but not yet in the PyTorch alternatives

athewsey commented 2 years ago

24 and #25 ported this update over to PyTorch on 11th June - closing this issue as done.

aws-samples / sagemaker-101-workshop

Simplify MNIST challenge via AWS Open Data #4

24 and #25 ported this update over to PyTorch on 11th June - closing this issue as done.