YuvalNirkin / fsgan

FSGAN - Official PyTorch Implementation
https://nirkin.com/fsgan
Creative Commons Zero v1.0 Universal
754 stars 147 forks source link

Why discriminator is not conditioned on generator input? #45

Closed bomb2peng closed 4 years ago

bomb2peng commented 4 years ago

Hi, thanks for your great work. Conventionally, pix2pix discriminators are conditioned on generator inputs, right? It seems your training code in version 1 only takes generated image as input to D. What is your insight and did you try conventional conditional-GAN?

YuvalNirkin commented 4 years ago

You're mistaken the discriminator is conditioned on both the real images (the generator inputs) and the predicted images in both versions of the code.

bomb2peng commented 4 years ago

I am reading train_reenactment.py in v1 code.

# Fake Detection and Loss
img_pred_pyd = img_utils.create_pyramid(img_pred, len(img1))
pred_fake_pool = D([x.detach() for x in img_pred_pyd])
loss_D_fake = criterion_gan(pred_fake_pool, False)

# Real Detection and Loss
pred_real = D(img2)
loss_D_real = criterion_gan(pred_real, True)

loss_D_total = (loss_D_fake + loss_D_real) * 0.5

The generator takes as input a real image from different frame and a landmark heatmap, but I did not see D take these as input in the above code. Am I missing something?

YuvalNirkin commented 4 years ago

The discriminator takes the target image (img2) as the real image input, it has the the same distribution as the generator's input image (img1).

bomb2peng commented 4 years ago

Maybe I did not describe my question clearly. Let's say img_pred = G(img1, landmark2), and reconstruction loss wants img_pred to be close to img2. I think conventional pix2pix network has discriminator that takes in (img1, landmark2, img_pred) as a fake sample and (img1, landmark2, img2) as a real sample. Here (img1, landmark2) are inputs to G and also conditioning inputs to D. In your code, it's D(img_pred) VS D(img2), right? By this, I think your discriminator "is not conditioned on generator input". Hope I made my question clear.

YuvalNirkin commented 4 years ago

I am not aware that this kind of formulation have been proposed in the pix2pix or pix2pixHD papers. Please refer to the exact lines in the papers if you have seen it. Anyway I think this is missing the point of the discriminator.

bomb2peng commented 4 years ago

In the pix2pix paper, please see Figure 2 that D takes in both generated (or real) image and the corresponding edge image. Please also refer to Equation (1). By doing this, D is not only judging the realism of generated image G(x), but also the consistency between x and G(x). The pix2pixHD is improved based on pix2pix, so it has similar conditional discriminator. Conditional GANs are like this. Anyway, since your final result is quite good, I think this conditional GAN loss may be not that important in this specific scenario.

YuvalNirkin commented 4 years ago

I see, this might be worth trying, thank you!