betterze / StyleSpace

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
311 stars 34 forks source link

S space #16

Closed PangziZhang523 closed 2 years ago

PangziZhang523 commented 2 years ago

Thanks for the interesting work. I would like to ask how did you get s space? In the paper, it is said that it is obtained by the In-domain gan method, but I read the In-domain gan paper to get W space. Is s space obtained on W space or is it a separate space?

woctezuma commented 2 years ago

Not sure if relevant, but you can find the following in the original StyleGAN paper:

A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras (NVIDIA), Samuli Laine (NVIDIA), Timo Aila (NVIDIA) https://arxiv.org/abs/1812.04948 https://github.com/NVlabs/stylegan

Figure 1 Text

PangziZhang523 commented 2 years ago

Thank you for the answer. My understanding is that the 'style' in stylegan refers to the style parameter in Adain, which is not the same as the latent space in stylespace.

betterze commented 2 years ago

The way to transfer w+ space to s space could be found here.

Each w in the w+ space (18 w) will be passed to an affine transformation (A in this figure) to get the s parameters. The detailed mapping between w+ and s could be found in our supplemnetary Table 2.

PangziZhang523 commented 2 years ago

Thanks for the answer, I probably understand the relationship between s and w. Then https://github.com/betterze/StyleSpace/blob/main/manipulate.py#L229 can also transfer w to s, right? , Because they are all affine transformations, but w is a 1*512 vector, and w+ is 18 w. Another question is how Inversion gets s when comparing in Figure 18. If it is from w to s, the results of w and s should be the same because there is no editing.

betterze commented 2 years ago
  1. The function can transfer w+ to s. To get w+ from w, just repeat the function 18 times, so the shape will be from (1,512) to (1, 18, 512). In this way, we can transfer w -> w+ -> s.

  2. In figure 18, we compare different spaces for inversion. Given the same input image, we invert it to w, w+ or s space separately. The information is flowed from image to latent space (image->s->w+), rather than from latent space to image (w+->s->image). Since there are affine transformations from w+ to s, so inversion to w+ will be more constrained than that to s, the s reconstruction will be slightly better than w+ reconstruction.

PangziZhang523 commented 2 years ago

Thank you for the answer. How did you get the s in Inversion in Figure 18 directly? There is something unclear here. Could you please answer it again? That is image->s.

betterze commented 2 years ago

In figure 18, the latent codes are obtained through latent optimization. Here is standard latent optimization code from StyleGAN2 paper (section 5). It optimizes in W space, we modify this code to S or W+ space.

If you want to do latent optimization in S space, we suggest you to use the pytorch implementation, which is easier to play with.

PangziZhang523 commented 2 years ago

Thanks a lot for answers. There is also a problem when looking for the channel of a specific attribute. I only used a male classifier to get the attribute channel which is different from your gender (9, 6). I used 50 positive images, and the layer and channel I found were (8, 223).But this is not the channel that determines gender. image

betterze commented 2 years ago

In the paper, we claim that just using 20-30 example images, we can have top 5 accuracy higher than 90%. It is the top 5 accuracy, rahter than top 1 accuracy.

For the case you show above, the target channel 9_6 is in rank 5. So for top 5 accuracy, it is a success. For top 1 accuracy, it is a failure.