COCO dataset may be corrupted

🐛 Bug

I am trying to execute the visual bert example with VQA2 but one of the datasets seems to be corrupted. I can download test2015.tar.gz, trainval2014.tar.gz and coco_val2017.tar.gz but always get an error with coco_train2017.tar.gz. This is not a bug per say but since the data seems to be hosted for mmf specifically I thought creating this issue would be the best way for reporting the problem. It seems either the header is corrupted or part of the data is missing.

Here's the error I always get:

RuntimeWarning: Received less data than specified in Content-Length header for https://dl.fbaipublicfiles.com/mmf/data/datasets/coco/defaults/features/coco_train2017.tar.gz. There may be a download problem.
Downloading coco_train2017.tar.gz:  32%|█████████████████████████████████████████▍                                                                                        | 31.7G/99.4G [4:25:53<9:28:08, 1.99MB/s]

Command

$ MMF_CACHE_DIR=/backup/mmf mmf_run config=mmf/projects/visual_bert/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=visual_bert

Expected behavior

Download the datasets. :)

Environment

PyTorch version: 1.9.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.27

Python version: 3.9 (64-bit runtime) Python platform: Linux-4.15.0-175-generic-x86_64-with-glibc2.27

Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.6.0.dev0 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0 [pip3] torchmetrics==0.7.3 [pip3] torchtext==0.5.0 [pip3] torchvision==0.10.0 [conda] Could not collect

facebookresearch / mmf