eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.19k stars 570 forks source link

Injecting pixel2style2pixel generated dlatent into official stylegan2 projector #86

Closed obravo7 closed 3 years ago

obravo7 commented 3 years ago

I am not sure if this is the right place to ask this, but has anyone tried injecting the generated dlatent from pSp directly into the official stylegan2 projector? I recently tried this but there seems to be no noticeable changes to the image. Curious if anyone had a similar outcome (note: I used a custom dataset).

yuval-alaluf commented 3 years ago

Hi @obravo7 , I actually tried this myself and did see an improvement on the latents initialized with pSp. There were several changes that needed to be made to load the latents instead of starting from a random vector, but I was able to see a slight improvement. Not sure what dataset you're using, but did you try testing the projection on a specific image that pSp struggled with?

obravo7 commented 3 years ago

@yuval-alaluf Thank you for your response. It is a dataset that I created myself and unfortunately cannot share the contents of it, so it could just be that I don't have a well trained GAN (actually, this seems to be the case).

I am wondering what changes you made to be able to insert the dlatent into the Stylegan 2 projector?As I understand it, from:

class Projector:
    def __init__(self):
        self.num_steps                  = 1000
        self.dlatent_avg_samples        = 10000
        self.initial_learning_rate      = 0.1
        self.initial_noise_factor       = 0.05
        self.lr_rampdown_length         = 0.25
        self.lr_rampup_length           = 0.05
        self.noise_ramp_length          = 0.75
        self.regularize_noise_weight    = 1e5
        self.verbose                    = False
        self.clone_net                  = True

        self._Gs                    = None
        self._minibatch_size        = None
        self._dlatent_avg           = None
        self._dlatent_std           = None
        self._noise_vars            = None
        self._noise_init_op         = None
        self._noise_normalize_op    = None
        self._dlatents_var          = None
        self._noise_in              = None
        self._dlatents_expr         = None
        self._images_expr           = None
        self._target_images_var     = None
        self._lpips                 = None
        self._dist                  = None
        self._loss                  = None
        self._reg_sizes             = None
        self._lrate_in              = None
        self._opt                   = None
        self._opt_step              = None
        self._cur_step              = None

we should be changing self._dlatent_avg, self._dlantent_std, and self._dlatent_expr, where self._dlatent_expr is the dlatent to be injected with pSp, and the standard deviation and mean are calculated from this variable. Or did I miss something?

Note: I didn't try injecting latent from pretrained, opensourced models, such as FFHQ.

yuval-alaluf commented 3 years ago

I actually use rosinality's pytorch implementation for StyleGAN2 for performing optimization which I find easier to work with than the original tensorflow code.
I can share what changes I made to rosinality's code and hopefully it can help you out.

First, in Line 154 to initialize the starting latent, they have:

with torch.no_grad():
    noise_sample = torch.randn(n_mean_latent, 512, device=device)
    latent_out = g_ema.style(noise_sample)
    latent_mean = latent_out.mean(0)
    latent_std = ((latent_out - latent_mean).pow(2).sum() / n_mean_latent) ** 0.5

I simply changed this to:

latent = np.load("latent.npy")
latent_in = torch.from_numpy(latent).unsqueeze(0).cuda()
latent_mean = latent_in.mean(0)
latent_std = ((latent_in - latent_mean).pow(2).sum() / n_mean_latent) ** 0.5

where where latent is of size [n_styles, 512] and latent_in should be of size [1, n_styles, 512] if I remember correctly.

I believe everything else was left as is.

obravo7 commented 3 years ago

@yuval-alaluf I did the same but with the Tensorflow version. I will try with the pytorch version (I expect the same results. Likely I need to spend more time on my dataset.). Thank you for your help!