XiaohangZhan / deocclusion

Code for our CVPR 2020 work.
Apache License 2.0
794 stars 104 forks source link

please:subprocess.CalledProcessError: Command #29

Closed zhenghan408 closed 4 years ago

zhenghan408 commented 4 years ago

the more details: subprocess.CalledProcessError: Command '['/home/lc/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', './config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 2.

XiaohangZhan commented 4 years ago

What is your command? Please provide with the full log.

zhenghan408 commented 4 years ago

Traceback (most recent call last): File "/home/lc/anaconda3/envs/deo/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/lc/anaconda3/envs/deo/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in main() File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/lc/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', 'experiments/COCOA/pcnet_m/config.yaml', '--launcher', 'pytorch']' returned non-zero exit status 1.

zhenghan408 commented 4 years ago

thanks a lot for your reply

XiaohangZhan commented 4 years ago

Sorry I still cannot locate the issue from this limited information. Could you please provide your script, the environment, and a full log?

zhenghan408 commented 4 years ago

Sorry I still cannot locate the issue from this limited information. Could you please provide your script, the environment, and a full log?

sorry , The above problem has been solved,But when i run the second step:sh experiments/COCOA/pcnet_c/train.sh

I encountered other problems, as follows:

/media/lc/软件/zh/deocclusion-master/experiments/COCOA/pcnet_c/config.yaml main.py:15: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) => loading checkpoint '/media/lc/软件/zh/deocclusion-master/pretrains/partialconv_input_ch4.pth' Traceback (most recent call last): File "main.py", line 49, in main(args) File "main.py", line 30, in main trainer = Trainer(args) File "/media/lc/软件/zh/deocclusion-master/trainer.py", line 61, in init args.model, load_pretrain=args.load_pretrain, dist_model=True) File "/media/lc/软件/zh/deocclusion-master/models/partial_completion_content_cgan.py", line 53, in init self.criterion = InpaintingLoss(backbone.VGG16FeatureExtractor()).cuda() File "/media/lc/软件/zh/deocclusion-master/models/backbone/pconv_unet.py", line 36, in init vgg16 = models.vgg16(pretrained=True) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torchvision/models/vgg.py", line 144, in vgg16 return _vgg('vgg16', 'D', False, pretrained, progress, kwargs) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torchvision/models/vgg.py", line 92, in _vgg progress=progress) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/hub.py", line 434, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, pickle_load_args) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/serialization.py", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input Traceback (most recent call last): File "/home/lc/anaconda3/envs/deo/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/lc/anaconda3/envs/deo/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in main() File "/home/lc/anaconda3/envs/deo/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/lc/anaconda3/envs/deo/bin/python', '-u', 'main.py', '--local_rank=0', '--config', '/media/lc/软件/zh/deocclusion-master/experiments/COCOA/pcnet_c/config.yaml', '--launcher', 'pytorch', '--load-pretrain', '/media/lc/软件/zh/deocclusion-master/pretrains/partialconv_input_ch4.pth']' returned non-zero exit status 1.

thanks!!!

XiaohangZhan commented 4 years ago

This might be due to the incomplete vgg pretrained file. During distributed training, when you downloading a file through network with multiple processors, the file will be destroyed. The first solution, you could manually download VGG pretrained file to the torch's default checkpoint location, typically ~/.cache/torch. The second solution, run this training command with 1 GPU first. After the vgg file is downloaded, re-run it with multiple GPUs.

zhenghan408 commented 4 years ago

ok ,i will try it ,thanks so much!!!

zhenghan408 commented 4 years ago

thanks so much ,I solved the problem!!!!!!