frank-xwang / CLD-UnsupervisedLearning

[CVPR 2021] Code release for "Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination."
MIT License
100 stars 9 forks source link

CIFAR Download Error #8

Closed cliangyu closed 3 years ago

cliangyu commented 3 years ago

I am trying to run bash scripts/cifar/train_cifar10_moco_cld.sh while encounter cifar download issue.

(simclr-1) liangyu@sphadmin-G560-V5:/space/liangyu/workspace/jhu/code/CLD-UnsupervisedLearning$ bash run. sh [05/14 09:09:56 moco+cld]: Full config saved to checkpoint/cifar10/MoCo+CLD/resnet18/lr0.03-bs256-cldT0.2 -nceT0.07-clusters200-lambda0.8-cosine-weightDecay8e-4-fp16-add_erasing-AugPlus-kMeans-ncek12288-bslr0.03 -normlinear/config.json ==> Preparing data.. Traceback (most recent call last): File "train_cifar_moco_cld.py", line 366, in <module> main(opt) File "train_cifar_moco_cld.py", line 165, in main train_loader, test_loader, ndata = get_dataloader(args, add_erasing=args.erasing, aug_plus=args.aug_plus) File "/space/liangyu/workspace/jhu/code/CLD-UnsupervisedLearning/datasets/dataloader.py", line 81, in $ et_dataloader trainset = datasets.CIFAR10Instance(root='./data/CIFAR-10', train=True, download=True, transform=tra$sform_train, two_imgs=args.two_imgs, three_imgs=args.three_imgs) File "/space/liangyu/workspace/jhu/code/CLD-UnsupervisedLearning/datasets/cifar.py", line 11, in __ini$ __ super(CIFAR10Instance, self).__init__(root=root, train=train, download=download, transform=transform$ File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torchvision/datasets/cifar.py"$ line 65, in __init__ self.download() File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torchvision/datasets/cifar.py"$ line 143, in download download_and_extract_archive(self.url, self.root, filename=self.filename, md5=self.tgz_md5) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torchvision/datasets/utils.py"$ line 316, in download_and_extract_archive download_url(url, download_root, filename, md5) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torchvision/datasets/utils.py"$ line 124, in download_url url = _get_redirect_url(url, max_hops=max_redirect_hops) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torchvision/datasets/utils.py"$ line 75, in _get_redirect_url with urllib.request.urlopen(urllib.request.Request(url, headers=headers)) as response: File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/urllib/request.py", line 641, in http_respon$ e 'http', request, response, code, msg, hdrs) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/urllib/request.py", line 569, in error return self._call_chain(*args) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/urllib/request.py", line 649, in http_error_d efault raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 500: Internal Server Error Killing subprocess 50818 Traceback (most recent call last): File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torch/distributed/launch.py", l ine 340, in <module> main() File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torch/distributed/launch.py", l ine 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/liangyu/anaconda3/envs/simclr-1/lib/python3.7/site-packages/torch/distributed/launch.py", l ine 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/liangyu/anaconda3/envs/simclr-1/bin/python', '-u', 'train _cifar_moco_cld.py', '--local_rank=0', '--dataset', 'cifar10', '--num-workers', '4', '--batch-size', '256 ', '--nce-t', '0.07', '--nce-k', '12288', '--base-learning-rate', '0.03', '--lr-scheduler', 'cosine', '-- warmup-epoch', '5', '--weight-decay', '8e-4', '--cld_t', '0.2', '--save-freq', '100', '--three-imgs', '-- use-kmeans', '--num-iters', '5', '--Lambda', '0.8', '--normlinear', '--aug-plus', '--erasing', '--cluster s', '200', '--save-dir', 'checkpoint/cifar10/MoCo+CLD/resnet18/lr0.03-bs256-cldT0.2-nceT0.07-clusters200- lambda0.8-cosine-weightDecay8e-4-fp16-add_erasing-AugPlus-kMeans-ncek12288-bslr0.03-normlinear']' returne d non-zero exit status 1.

frank-xwang commented 3 years ago

I tried to run the code and re-download the CIFAR-10 data, the data was successfully downloaded and the code worked well on my server. Screen Shot 2021-05-13 at 7 15 08 PM I am using Python==3.7.9, pytorch=="1.6.0+cu101" and torchvision==0.7.0+cu101'. You can try to upgrade or downgrade the pytorch/torchvision version to the version I am using. In addition, you may want to double-check whether the machine has insufficient space or the Internet is unavailable. Because the code used to download CIFAR data is called from pytorch/torchvision, and we haven't changed any data downloading part of the code, we may not be able to provide much support. If you still have the error after upgrading/downgrading the pytorch/torchvision version and carefully checked everything, I am sorry for that, I suggest you directly open an issue in the official pytorch repository: https://github.com/pytorch/pytorch.