Closed XFeiF closed 3 years ago
L1 and L2 loss produce blurry results on image reconstruction problems. [Autoencoding beyond pixels using a learned similarity metric.]
At inference time, they run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that they apply dropout at test time, and they apply batch normalization using the statistics of the test batch, rather than aggregated statistics of the training batch.
[paper] && [code]
Authors:
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
Berkeley AI Research (BAIR) Laboratory, UC Berkeley
Highlight
This paper considers the image-to image translation problem as the input and output differs in surface appearance but both are renderings of the same underlying structure. The results suggest that the conditional adversarial networks are a promising approach for many image-to-image translation tasks, especially those involving highly structured graphical outputs.
They use a "U-Net"-based architecture as the generator and for the discriminator they use a convolutional "PatchGAN" classifier, which only penalizes structure at the scale of patches. The discriminator tries to classify if each NxN patch in an image is real or fake.
The experiments cast on a variety of tasks and datasets, including: