eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.19k stars 570 forks source link

about ‘encoder_type' #103

Closed liuliuliu11 closed 3 years ago

liuliuliu11 commented 3 years ago

According to the narrative in the paper, the innovative points of this article are reflected in the following : an encoder backbone with a feature pyramid , generating three levels of feature maps from which styles are extracted using a simple intermediate network --map2style According to the code : the use of this innovation point is only reflected when the parameter encoder_type=GradualStyleEncoder takes a value, whether other parameters conflict with the innovation point.

yuval-alaluf commented 3 years ago

If I understood correctly, then you are correct. To train pSp with the FPN encoder, you should set encoder_type to GradualStyleEncoder. I am not entirely sure what you refer to when regarding the point "whether other parameters conflict with the innovation point."

liuliuliu11 commented 3 years ago

When the parameters are BackboneEncoderUsingLastLayerIntoW and BackboneEncoderUsingLastLayerIntoWPlus, innovation points cannot be used. In other words, the innovation is only reflected in the GradualStyleEncoder.

yuval-alaluf commented 3 years ago

Correct. The other encoder types are provided for completeness and for the reproducibility of the ablation studies presented in the paper.

liuliuliu11 commented 3 years ago

I would like to ask further, the StyleGAN reference is https://github.com/rosinality/stylegan2-pytorch. The Module Generator of the StyleGAN code includes mapping network (from z to w) and generation network (from w to image). In PSP's code, after the encoder_type parameter takes three values, the output variable input to StyleGAN will all go through the mapping network and generation network?

yuval-alaluf commented 3 years ago

The FPN-based pSp encoder takes an image as input and returns a vector of size 18x512. This vector is fed into the 18 inputs of StyleGAN to generate the output image.
Notice that this vector does not go through the mapping layer. In psp.py we have: https://github.com/eladrich/pixel2style2pixel/blob/0c83c42a913adc42d0ba0dabfa7d5b25b8f10ffd/models/psp.py#L90-L94 Note that input_code will be False and so input_is_latent will be set to True. Then, in the Generator code we define the mapping network as: https://github.com/eladrich/pixel2style2pixel/blob/0c83c42a913adc42d0ba0dabfa7d5b25b8f10ffd/models/stylegan2/model.py#L380-L387 When we call the forward function after encoding the image we have: https://github.com/eladrich/pixel2style2pixel/blob/0c83c42a913adc42d0ba0dabfa7d5b25b8f10ffd/models/stylegan2/model.py#L482-L483 Since input_is_latent=True, we do not go through the mapping network.

liuliuliu11 commented 3 years ago

thanks a lot