Open carmocca opened 2 years ago
Mnist and TrialMnist shall be cached in our CI, so there is no internet connectivity issue...
@carmocca I see that failing ones (in the conda CI e.g. this workflow) are the followings
tests/helpers/datasets.py::tests.helpers.datasets.TrialMNIST
tests/helpers/test_datasets.py::test_pickling_dataset_mnist[TrialMNIST-args1]
tests/helpers/test_models.py::test_models[None-BasicGAN]
tests/models/test_horovod.py::test_horovod_multi_optimizer
and I think we ~can~ should remove
tests/helpers/datasets.py::tests.helpers.datasets.TrialMNIST
tests/helpers/test_datasets.py::test_pickling_dataset_mnist[TrialMNIST-args1] # which does nothing really
and replace MNIST
with RandomDataset
in the rest of the tests.
Mnist and TrialMnist shall be cached in our CI, so there is no internet connectivity issue...
@Borda Seems like they are not cached in the conda workflow: https://github.com/PyTorchLightning/pytorch-lightning/actions/runs/1681919780/workflow
there is no internet connectivity issue...
Sure, but it adds complexity and time. The purpose of each of these ordinary tests should not need the actual data, so if we can make it simpler, it is worth it to me.
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Proposed refactor
Some of our tests use the
MNIST
andTrialMNIST
classes which download the MNIST data off the internet (or from a cache).https://github.com/PyTorchLightning/pytorch-lightning/blob/948cfd24de4f64a2980395581f15544e5e37eab0/tests/helpers/datasets.py#L25
https://github.com/PyTorchLightning/pytorch-lightning/blob/948cfd24de4f64a2980395581f15544e5e37eab0/tests/helpers/datasets.py#L134
Motivation
We should avoid this to reduce the inherent flakiness of network access.
Pitch
If a test uses these classes, do one of 3 options:
RandomDataset
or aRandomMNIST
class which would "mock" the actual MNIST dataset but with random data.tests/benchmarks
Additional context
Master is currently blocked due to errors while reading the MNIST data zipfiles.
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @borda @akihironitta