Tencent / Real-SR

Real-World Super-Resolution via Kernel Estimation and Noise Injection
Apache License 2.0
775 stars 118 forks source link

erro in multi-GPU training #7

Open wytcsuch opened 4 years ago

wytcsuch commented 4 years ago

when I train with multiply GPU,an erro occur: File "/wytdata/Real-SR/codes/models/SRGAN_model.py", line 74, in __init__ model = create_model(opt) File "/wytdata/Real-SR/codes/models/__init__.py", line 14, in create_model m = M(opt) File "/wytdata/Real-SR/codes/models/SRGAN_model.py", line 74, in __init__ device_ids=[torch.cuda.current_device()]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 238, in __init__ device_ids=[torch.cuda.current_device()]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 238, in __init__ "DistributedDataParallel is not needed when a module " AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient. "DistributedDataParallel is not needed when a module " AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

The cause of the error appears to be to stop updating the parameters of VGG,Is there any solution? if self.cri_fea: # load VGG perceptual loss self.netF = networks.define_F(opt, use_bn=False).to(self.device) if opt['dist']: self.netF = DistributedDataParallel(self.netF,device_ids=[torch.cuda.current_device()]) #erro occurs here else: self.netF = DataParallel(self.netF)

Jinbo-Hu commented 2 years ago

Do you solve this problem? I also have the same issue.