facebookresearch / pytorch_GAN_zoo

A mix of GAN implementations including progressive growing
BSD 3-Clause "New" or "Revised" License
1.62k stars 271 forks source link

Error while training StyleGAN with CIFAR-10 #124

Open hexiangdong2017 opened 3 years ago

hexiangdong2017 commented 3 years ago

(fb_gan_zoo) root@f56c103c5607:~/pytorch_GAN_zoo# python train.py StyleGAN -c config_cifar10.json --restart -n cifar10 Setting up a new session... Running StyleGAN size 10 50000 images found AC-GAN classes : {'Main': {'order': 0, 'values': ['horse', 'deer', 'automobile', 'cat', 'frog', 'ship', 'airplane', 'truck', 'dog', 'bird']}}

size 10 50000 images found 50000 images detected size (8, 8) 50000 images found Changing alpha to 0.000 /root/pytorch_GAN_zoo/models/baseGAN.py:278: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.) avgp.mul(0.999).add_(0.001, p.data) Traceback (most recent call last): File "train.py", line 137, in GANTrainer.train() File "/root/pytorch_GAN_zoo/models/trainer/progressive_gan_trainer.py", line 235, in train status = self.trainOnEpoch(dbLoader, scale, File "/root/pytorch_GAN_zoo/models/trainer/gan_trainer.py", line 486, in trainOnEpoch allLosses = self.model.optimizeParameters(inputs_real, File "/root/pytorch_GAN_zoo/models/base_GAN.py", line 249, in optimizeParameters self.classificationPenalty(predFakeD, File "/root/pytorch_GAN_zoo/models/base_GAN.py", line 563, in classificationPenalty loss.backward(retain_graph=True) File "/root/anaconda3/envs/fb_gan_zoo/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/envs/fb_gan_zoo/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward Variable._execution_engine.run_backward( RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 512]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

hexiangdong2017 commented 3 years ago

@likethesky @Celebio @teytaud @colesbury

varshakishore commented 3 years ago

Were you able to fix this error? @hexiangdong2017

CyberKing0514 commented 3 years ago

pytorch_GAN_zoo/models/networks/styleGAN.py

modify line 158 to

self.mean_w.data = self.gamma_avg self.mean_w.data + (1-self.gamma_avg) mapping.mean(dim=0, keepdim=True)

mhaines94108 commented 2 years ago

CyberKing's fix works better without the extra *'s:

        self.mean_w.data = self.gamma_avg * self.mean_w.data + (1 - self.gamma_avg) * mapping.mean(
            dim=0, keepdim=True)