genforce / interfacegan

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing
https://genforce.github.io/interfacegan/
MIT License
1.51k stars 281 forks source link

Why the latent code from GAN inversion methods can be manipulated by the boundary #43

Closed WJ-Lai closed 4 years ago

WJ-Lai commented 4 years ago

Hi, thanks for sharing this great work!

I'm trying to edit a new face. In https://github.com/genforce/interfacegan/issues/30, it is suggested using https://github.com/Puzer/stylegan-encoder firstly to get the new face latent code of W+ space. However, the shape of the latent code is (18, 512) and 18 layers have different values.

What confusing me is :

  1. The shape of "stylegan_ffhq_age_w_boundary.npy" is (1,512), so if using (1, 512) boundary to edit (18,512) latent code, all layers will edit by the same value. But the meaning of different layers of (18,512) latent code is not the same, because the values of 18 layers are different.

Why can we use (1, 512) boundary to edit (18,512) latent code? Why it can also work?

  1. If the (18,512) latent code has different values of its 18 layers, training a (18, 512) boundary (which also has different values of its 18 layers) is more reasonable, isn't it?

  2. In your paper, you also do the experiment of real images. What the latent space did you get from your stylegan encode? Z, W or W+? If the shape of your latent code is (18,512), do 18 layers have different values?

Thank you!

ShenYujun commented 4 years ago
  1. Yes, you are right. You can manipulate all layers with the same boundary. The reason is that during the training of StyleGAN, the $w$ code, which is mapped from $z$ code is repeated 18 times before fed into different layers.

  2. Training 18 different boundaries for various layers is also a feasible solution. However, the training data ($w$ code) for these layers are all the same (as mentioned above, the $w$ code is repeated 18 times by StyleGAN). A more reasonable solution is to partially manipulate these layers (e.g., only manipulate 0-3 layers for pose). Please see HiGAN for more details.

  3. I use W+, which has different $w$ for different layers.