Latent optimizer to extract a representation

bryandlee / repurpose-gan

Implementation of Repurposing GANs for One-shot Semantic Part Segmentation

126 stars 17 forks source link

Latent optimizer to extract a representation #3

Open denabazazian opened 3 years ago

denabazazian commented 3 years ago

I am wondering how can I evaluate the model by a real image instead of a generated image by StyleGAN.

The input image usually should be embed into the latent space of GAN by a latent optimizer to reproduces the input image and extract a representation from an image. However, I cannot find this latent optimizer in the code. Did you feed an input image into Pix2Pix’s encoder and use activation maps from all convolutional layers of the generator (decoder) to construct a pixel-wise representation?

Would it be possible to release the code for testing input images?

Thanks for your great work!

bryandlee commented 3 years ago

Hi. You can try the optimization-based method proposed in the original StyleGAN2 paper: (unofficial implementation) https://github.com/rosinality/stylegan2-pytorch/blob/master/projector.py

denabazazian commented 3 years ago

@bryandlee Many thanks for your reply. I have tried to use the projector code from StyleGAN2. But, the latent_in from that code is aligned with the generated projected image of the input. Does it mean that I should modify lines #170 and #173 to get the latent_in directly from the input image regardless of sample_noise and latent_mean? Or am I missing something?

bryandlee commented 3 years ago

Hi, I don't quite get what you mean by "getting the latent_in directly from the input image regardless of sample_noise and latent_mean". The code finds the latent vectors and noises that can be fed into the generator to generate the closest projection of a given input image.

denabazazian commented 3 years ago

Yes, the projector code generates the closest projection of a given input image, and the problem is that in most of the cases the view-point and some features of the input images are changed. So, the semantic segmentation result is not corresponding to the input image. In the Supplementary Material of the paper, it is written that the input image is fed into a Pix2Pix’s encoder to construct a pixel-wise representation. I am just wondering if there is any further implementation or explanation regarding that. Thanks.

bryandlee commented 3 years ago

I see. The "auto-shot segmentation" part of the paper is not implemented, but you can sample image-label pairs from the few-shot model and use them to train any semantic segmentation model. I'll let you know if I have a chance to do it.