Open pandasfang opened 6 years ago
Hi @pandasfang,
In optimizerG = optim.Adam(netG.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))
, we tell the optimizer that it only needs to update the parameters of generator. That is, although netD will receive gradients, it won't be updated, so we don't have to detach it.
Now, you may have another question. Why do we call detach in this line output = netD(fake.detach())
? Well, the answer is that it's not necessary to call detach.
Considering the following example which is a very simple auto-encoder.
fc1 = nn.Linear(1, 2)
fc2 = nn.Linear(2, 1)
opt1 = optim.Adam(fc1.parameters(),lr=1e-1)
opt2 = optim.Adam(fc2.parameters(),lr=1e-1)
x = Variable(torch.FloatTensor([5]))
z = fc1(x)
x_p = fc2(z)
cost = (x_p - x) ** 2
'''
print (z)
print (x_p)
print (cost)
'''
opt1.zero_grad()
opt2.zero_grad()
cost.backward()
for n, p in fc1.named_parameters():
print (n, p.grad.data)
for n, p in fc2.named_parameters():
print (n, p.grad.data)
opt1.zero_grad()
opt2.zero_grad()
z = fc1(x)
x_p = fc2(z.detach())
cost = (x_p - x) ** 2
cost.backward()
for n, p in fc1.named_parameters():
print (n, p.grad.data)
for n, p in fc2.named_parameters():
print (n, p.grad.data)
The output would be :
weight
12.0559
-8.3572
[torch.FloatTensor of size 2x1]
bias
2.4112
-1.6714
[torch.FloatTensor of size 2]
weight
-33.5588 -19.4411
[torch.FloatTensor of size 1x2]
bias
-9.9940
[torch.FloatTensor of size 1]
================================================
weight
0
0
[torch.FloatTensor of size 2x1]
bias
0
0
[torch.FloatTensor of size 2]
weight
-33.5588 -19.4411
[torch.FloatTensor of size 1x2]
bias
-9.9940
[torch.FloatTensor of size 1]
You can find that there's no influence on the gradients of fc2 though we detach the result from fc1. Once we know that the gradient won't be influenced, we can simply use the optimizerD
(which only updates the parameters of discriminator) to update the netD without concerning the generator (even when we don't detach it). However, it may lead to some additional computational cost if you don't detach the parts which you don't need.
Thanks
I think it's a good question and you guys can verify if what I told is right (maybe I am wrong because I am still learning, too :) ).
If possible, please keep this thread open, and I think it would be helpful for people who want to know more about detach
.
It's also highly welcome to discuss with me.
Thanks
Soumith's reply in this thread might also clarify things a little bit... [https://github.com/pytorch/examples/issues/116]
Hi @yyrkoon27 ,
In this case, it's right. In VAE-GAN, the detach function may be needed for the correctness if you use, for example, opt1 = optim.RMSprop(G.parameters(), lr=1e-1)
where G consists of an encoder and a decoder.
Dear TA:
In the Lab3-2. why don't we need to detach Discriminator when we backward propagate Generator?