eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.2k stars 568 forks source link

Question about encoding image #284

Closed Eric07110904 closed 2 years ago

Eric07110904 commented 2 years ago

Thanks for your awesome works, I have a question about GAN inversion. I used the psp to do GAN inversion in anime domain(512x512 300k images), and used pre-trained anime StyleGAN2(512x512 ). image After training 100,000 iteration with batch_size=4, I observed two problems.

  1. detaied structure of anime face(it seems that my model didn't capture the part of mouth、wink) image image

  2. output is blured image

Do you have any suggentstion about solving two problems? I am wondering if any parameter is set wrong or should i do more iterations or add w_norm loss? Thanks for your reply!

yuval-alaluf commented 2 years ago

Using the ID loss is a bit strange here since it was trained for real face images and your anime dataset is out of domain for this. Other than that, it could be that pSp is not able to fully capture all the details here. You could try other more advanced encoders such as ReStyle and Hyperstyle, or try optimization-based approaches like PTI.