eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.19k stars 570 forks source link

Encoding my own images #45

Closed yagizcolak closed 3 years ago

yagizcolak commented 3 years ago

Hello, newbie here. I opened your project in Colab but couldn't manage to encode my own images. I tried manually uploading my images to the "inversion_images" folder, but got some undesirable results. I believe the reason is that the images need to be aligned and resized before being fed to pSp. So, how can I achieve this?

yuval-alaluf commented 3 years ago

Hi @yagizcolak , I assume you are trying to run the encoding in the last section of the notebook (Inversion "In the Wild"). In that case, you are correct that the issue is most likely the alignment (as the images found in inversion_images have already been pre-aligned). However, earlier in the notebook we have a section "Align Image" which will align images. All you need to do is copy over the relevant alignment code to the last section to make sure your custom images will also be aligned.
If this is still unclear let me know!

yagizcolak commented 3 years ago

Thank you for the answer, I managed to do it just fine. I have another question though, is it possible to play with latent directions with pixel2style2pixel? For example, can I do smile transformation or gender transformation or things like that?

yuval-alaluf commented 3 years ago

Yes, it's possible to play with the latent directions. When running inference with pSp, in additional to saving the reconstructed image, you can add support for saving the latent codes of each test image (e.g. as an npy file). I explained how this can be done in #36 so feel free to check out my explanation there.
Once you have these latent codes, you can play around in the latent space using the latent direction of StyleGAN2 models. While we don't provide code for latent space traversal in this repo, there are several works that do this with StyleGAN2 models.

yagizcolak commented 3 years ago

I managed to save the encoded images in npy format using numpy.save method. However, when I played with the latent directions of that npy file, I got some ridiculous results(I used pbaylies's repo). Can the reason behind this be the fact that your encoding processes are different? As far as I see, both of mine and pbaylies' npy files have the same format. So, do you have any idea why it doesn't work properly? Thanks.

yuval-alaluf commented 3 years ago

The latent space traversal from pbaylies's repository is done using StyleGAN generators whereas in our work, we use a StyleGAN2 generator to extract the latents. Since these are two different domains, the boundaries used in pbaylies's repository will not work well for performing manipulations on latents extracted using pSp.
If you wish to perform latent space manipulations with latents extracted here, you should find a work that performed these manipulations in the StyleGAN2 latent domain. One such work is GANSpace. There are other newer works, such as StyleSpace, that also use StyleGAN2, but I do not believe code has been released yet.

yagizcolak commented 3 years ago

Oh, I see. I tried using InterFaceGAN today, but I came across another issue. I think my encoded images(npy files generated by pSp) have a dimension of (18,512) but InterFaceGAN uses encoded images of shape (1,512). Can I convert these (18,512) files into (1,512) files? Would there be any data loss? Thanks again.

yuval-alaluf commented 3 years ago

InterFaceGAN has support for working with latents in W+ (i.e. latents of size (18, 512)). However, getting InterFaceGAN working with StyleGAN2 will require you to train your own semantic boundaries as they currently support only StyleGAN1 models.
If you have any more questions regarding getting InterFaceGAN to work with StyleGAN2 or with W+, I recommend asking the authors directly in their repository as they will be better able to help answer your questions and they answer quickly.