Downloading mnist - Githubissues

ashlaban commented 5 years ago

Setting up deep-diva on a new machine (thus possibly related to using torch@1.1) we ran into an issue downloading the mnist dataset.

$ python3 util/data/get_a_dataset.py --dataset mnist --output-folder dataset/mnist
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to dataset/mnist/MNIST/raw/train-images-idx3-ubyte.gz
100.1%Extracting dataset/mnist/MNIST/raw/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to dataset/mnist/MNIST/raw/train-labels-idx1-ubyte.gz
113.5%Extracting dataset/mnist/MNIST/raw/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to dataset/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
100.4%Extracting dataset/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to dataset/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
180.4%Extracting dataset/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
Processing...
Done!
Traceback (most recent call last):
  File "util/data/get_a_dataset.py", line 297, in <module>
    getattr(sys.modules[__name__], args.dataset)(args)
  File "util/data/get_a_dataset.py", line 46, in mnist
    'training.pt'))
  File "/home/ashlaban/.local/lib/python3.6/site-packages/torch/serialization.py", line 382, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'dataset/mnist/processed/training.pt'

vinaychandranp commented 5 years ago

The latest versions of Torchvision changed where the intermediate files are stored. The quick fix would be to replace the following lines in the script with:

get_a_dataset.py#L44-L49

    train_data, train_labels = torch.load(os.path.join(args.output_folder,
                                                       'MNIST',
                                                       'processed',
                                                       'training.pt'))
    test_data, test_labels = torch.load(os.path.join(args.output_folder,
                                                     'MNIST',
                                                     'processed',
                                                     'test.pt'))

get_a_dataset.py#L70-L71

    shutil.rmtree(os.path.join(args.output_folder, 'MNIST', 'raw'))
    shutil.rmtree(os.path.join(args.output_folder, 'MNIST', 'processed'))

ashlaban commented 5 years ago

Yeah, this is precisely the fix we applied on our side :)

In particular it seems to be somewhere between versions torchvision-0.2 and torchvision-0.2.2.post3. (torchvision-0.2 is confirmed to be working)!

I checked the conda requirements file thoroughly but could not find a pinned version for torchvision. Maybe you would want to fix this version as to avoid these incompatibilities in the future?

Also, if you could provide a pipenv/pip requirements file that'd be super!

DIVA-DIA / DeepDIVA

Downloading mnist #7