Closed liuliuliu11 closed 3 years ago
If I understood correctly, then you are correct. To train pSp with the FPN encoder, you should set encoder_type
to GradualStyleEncoder
.
I am not entirely sure what you refer to when regarding the point "whether other parameters conflict with the innovation point."
When the parameters are BackboneEncoderUsingLastLayerIntoW and BackboneEncoderUsingLastLayerIntoWPlus, innovation points cannot be used. In other words, the innovation is only reflected in the GradualStyleEncoder.
Correct. The other encoder types are provided for completeness and for the reproducibility of the ablation studies presented in the paper.
I would like to ask further, the StyleGAN reference is https://github.com/rosinality/stylegan2-pytorch. The Module Generator of the StyleGAN code includes mapping network (from z to w) and generation network (from w to image). In PSP's code, after the encoder_type parameter takes three values, the output variable input to StyleGAN will all go through the mapping network and generation network?
The FPN-based pSp encoder takes an image as input and returns a vector of size 18x512
. This vector is fed into the 18 inputs of StyleGAN to generate the output image.
Notice that this vector does not go through the mapping layer.
In psp.py we have:
https://github.com/eladrich/pixel2style2pixel/blob/0c83c42a913adc42d0ba0dabfa7d5b25b8f10ffd/models/psp.py#L90-L94
Note that input_code
will be False
and so input_is_latent
will be set to True
.
Then, in the Generator
code we define the mapping network as:
https://github.com/eladrich/pixel2style2pixel/blob/0c83c42a913adc42d0ba0dabfa7d5b25b8f10ffd/models/stylegan2/model.py#L380-L387
When we call the forward
function after encoding the image we have:
https://github.com/eladrich/pixel2style2pixel/blob/0c83c42a913adc42d0ba0dabfa7d5b25b8f10ffd/models/stylegan2/model.py#L482-L483
Since input_is_latent=True
, we do not go through the mapping network.
thanks a lot
According to the narrative in the paper, the innovative points of this article are reflected in the following : an encoder backbone with a feature pyramid , generating three levels of feature maps from which styles are extracted using a simple intermediate network --map2style According to the code : the use of this innovation point is only reflected when the parameter encoder_type=GradualStyleEncoder takes a value, whether other parameters conflict with the innovation point.