I am trying to execute the visual bert example with VQA2 but one of the datasets seems to be corrupted. I can download test2015.tar.gz, trainval2014.tar.gz and coco_val2017.tar.gz but always get an error with coco_train2017.tar.gz. This is not a bug per say but since the data seems to be hosted for mmf specifically I thought creating this issue would be the best way for reporting the problem. It seems either the header is corrupted or part of the data is missing.
Here's the error I always get:
RuntimeWarning: Received less data than specified in Content-Length header for https://dl.fbaipublicfiles.com/mmf/data/datasets/coco/defaults/features/coco_train2017.tar.gz. There may be a download problem.
Downloading coco_train2017.tar.gz: 32%|█████████████████████████████████████████▍ | 31.7G/99.4G [4:25:53<9:28:08, 1.99MB/s]
🐛 Bug
I am trying to execute the visual bert example with VQA2 but one of the datasets seems to be corrupted. I can download test2015.tar.gz, trainval2014.tar.gz and coco_val2017.tar.gz but always get an error with coco_train2017.tar.gz. This is not a bug per say but since the data seems to be hosted for mmf specifically I thought creating this issue would be the best way for reporting the problem. It seems either the header is corrupted or part of the data is missing.
Here's the error I always get:
Command
Expected behavior
Download the datasets. :)
Environment
PyTorch version: 1.9.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.27
Python version: 3.9 (64-bit runtime) Python platform: Linux-4.15.0-175-generic-x86_64-with-glibc2.27
Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.6.0.dev0 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0 [pip3] torchmetrics==0.7.3 [pip3] torchtext==0.5.0 [pip3] torchvision==0.10.0 [conda] Could not collect