NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.37k stars 1.12k forks source link

Project real image - inversion quality is worst than in StyleGAN2 #54

Open oferidan1 opened 2 years ago

oferidan1 commented 2 years ago

Hi, Thanks for sharing StyleGAN3 project! I've tried doing image inversion (real image to stylegan3 latent) using updated projector.py code of StyleGAN2-ada-pytorch, code taken from: https://github.com/NVlabs/stylegan3/issues/35 (I used ffhq based model). however, the inversion image result is abit different from the input image. doing the inversion process over stylegan2-ada-pytorch model reuslts in better image inversion. in addition, the optimization loss in stylegan3 is much higher than in stylegan2 (30 vs 0.3). Is this expected? if so, can you please explain? see example result below. Thanks in advanced, Ofer

image

betterze commented 2 years ago

same question

superSayianNathanjg commented 2 years ago

CrossDressGan3

PDillis commented 2 years ago

Try anchoring the latent space to w_avg, so that you make things easier for the projector/G itself:

if hasattr(G.synthesis, 'input'):  # make it general so you can also use StyleGAN2's pkl files
    shift = G.synthesis.input.affine(G.mapping.w_avg.unsqueeze(0))
    G.synthesis.input.affine.bias.data.add_(shift.squeeze(0))
    G.synthesis.input.affine.weight.data.zero_()

Add this before the call to the project() function in the code you linked. To get better results, you could also project in W+ (though they didn't invent it, it's the first bibliographical reference I found on it). I added both of these into my projection code modified from StyleGAN2-ADA-Pytorch. From my experiments in faces, projecting in W+ yields the best results.

For comparison's sake, here's some results of projecting in W+ (StyleGAN2/FFHQ-256, target image, and StyleGAN3/FFHQU-256, respectively, with --init-lr=0.5 and --num-steps=1000): image

I would be remiss if I didn't note that the point of accurately projecting a real image into the latent space was not the goal of StyleGAN2 (as per Section 5 and Figure 9 of the paper). Indeed, they wish to make it easy to attribute a fake image to its source/GAN, not to make it easy to project any real image into W, so this line of thought continued with StyleGAN2-ADA and I'd be willing to bet it also continued on to StyleGAN3.

Most likely, you will need to find other better ways to project images into the latent space, by using a different feature extractor (or using all the layers of VGG16), a higher learning rate, larger number of steps, project in W+, etc. Hope this helps.

oferidan1 commented 2 years ago

Hi Diego, thanks for your reply. I tried your latent optimization code and it provides better results than original stylegan2-ada projector for stylegan3 model, W+ is indeed better. still, when running the original nvidia projector from stylegan2-ada over stylegan2 model (which runs much simpler optimization), inversion is more realistic, for example: eyes are kept blue and not brown. image the authors of stylegan3 made many changes to their model that might make the inversion more complex, such as: reducing the size of mapping network from 8 to 2 layers, changing original constant input to fourier features, etc. Thanks, Ofer