Zhendong-Wang / Diffusion-GAN

Official PyTorch implementation for paper: Diffusion-GAN: Training GANs with Diffusion
MIT License
615 stars 65 forks source link

Possible typo in the adjusting formula in the paper? #14

Open ariel415el opened 1 year ago

ariel415el commented 1 year ago

Hi, In the paper you used the following formula for the adjusting variable rd Screenshot at 2023-01-25 15-55-07

But in the StyleGAN-ADA paper you're refereeing to the formula is Screenshot at 2023-01-25 15-55-17

Did you indeed refer to rt in StyleGAN-ADA

Substituting 0.5 from the discriminator usually makes no sense as the values are no between [0,1] as shown in the StyleGAN-ADA paper they are simply symmetrical around 0.

Am I missing something?

Zhendong-Wang commented 1 year ago

Actually the same. Since we wrote our paper from the vallina GAN objective, for the vallina GAN, the output of discriminator is [0, 1], the probablity, and we add the -0.5 in the paper for consistency. In the implementation, we follow the StyleGAN-ADA and the discriminator outputs the logits, which could be negative, hence we use

shoh4486 commented 1 year ago

@Zhendong-Wang Hello again, and I have a few questions here:

(1) If to use r_d with LSGAN loss, that is, the discriminator D will learn to regress to 0.0 ~ 1.0, without the last activation layer (just logit):

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.mynet = MyNetwork()  # has NO last activation layer such as sigmoid

    def forward(self, img, t):
        batch_size, H, W = img.size(0), img.size(-2), img.size(-1)
        t = torch.ones(batch_size, 1, H, W) * t.view(-1, 1, 1, 1)
        x = torch.cat((img, t), dim=1)  # Is this correct?

        x = self.mynet(x)

        return x  # logit, without the last activation layer

loss = nn.MSELoss()

# ... extra code omitted
real_diffused, t = diffusion(real)
d_real = discriminator(real_diffused, t)

# ... extra code omitted
d_loss_real = loss(d_real, torch.ones_like(d_real))
d_loss_fake = loss(d_fake, torch.zeros_like(d_fake))

r_d = (d_real.detach() - 0.5).sign().mean()

In this case, r_d = (d_real.detach() - 0.5).sign().mean() is correct?

(2) In the above code block, intoducing the time step t as concat, just as the condition concat, is okay? Or, a trainable structure, such as a fully-connected linear layer or single convolutional layer would be required?

Thank you !! :-)

Zhendong-Wang commented 1 year ago

(1) If you use sigmoid as the final activation function to rescale the output to [0, 1], then you should use r_d = (d_real.detach() - 0.5).sign().mean().

(2) we use concat and it works in our case, but I think better architecture design could benefit more here.