eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.2k stars 568 forks source link

why injecting random vectors into input image? #224

Closed dongyun-kim-arch closed 3 years ago

dongyun-kim-arch commented 3 years ago

I am trying to extract latent vector from my input image which is 18, 512 dim and want to manipulate this image based on certain features such as age or gender.

I managed to get the latent vector of my input image and move this vector to some feature's direction, but couldn't figure out the way to reconstruct an image from changed vectors.

It seems you imported stylegan2 generator as a decoder and do something, but I am stuck in this point...

Also, in style_mixing.py, why do you inject random vectors into input image which is totally unrelated to each other?? Random vector is used for current vector, but couldn't understand the overall workflow...

for image_idx, input_image in enumerate(input_batch):

generate random vectors to inject into input image

            vecs_to_inject = np.random.randn(opts.n_outputs_to_generate, 512).astype('float32')
            multi_modal_outputs = []
            for vec_to_inject in vecs_to_inject:
                cur_vec = torch.from_numpy(vec_to_inject).unsqueeze(0).to("cuda")
                # get latent vector to inject into our input image
                _, latent_to_inject = net(cur_vec,
                                          input_code=True,
                                          return_latents=True)
                # get output image with injected style vector
                res = net(input_image.unsqueeze(0).to("cuda").float(),
                          latent_mask=latent_mask,
                          inject_latent=latent_to_inject,
                          alpha=opts.mix_alpha,
                          resize=opts.resize_outputs)
                multi_modal_outputs.append(res[0])
yuval-alaluf commented 3 years ago

I managed to get the latent vector of my input image and move this vector to some feature's direction, but couldn't figure out the way to reconstruct an image from changed vectors.

You can gett the reconstruction by passing the modified latent to the generator. Something like this:

images, _ = self.generator([sample_latents], randomize_noise=False, input_is_latent=True)

Also, in style_mixing.py, why do you inject random vectors into input image which is totally unrelated to each other??

When performing style mixing in segmentation-to-image for example, you want to generate multiple images all with the same structure (e.g., head shape and hairstyle), but different color schemes. How do we get these different color schemes? By first randomly sampling many vectors and then mixing them to the latent code of the sketch image in the "fine" input layers of StyleGAN2.

In the snippet you tried pasting this is the overall flow:

  1. For each input image: a. Randomly sample N vectors (i.e., to generate N style-mixed outputs) b. For each random vector v sampled above:
    • Take the latent code of the sketch (for example) and replace the "fine" latent entries with the "fine" latent entries of v.
    • Pass the mixed latent code to SG2 to generate the output image res.

Hope this helps

dongyun-kim-arch commented 3 years ago

Thanks for your answer!

I tried images, _ = self.generator(sample_latents, randomize_noise=False, input_islatent=True) not images, = self.generator([sample_latents], randomize_noise=False, input_is_latent=True)

and your explanation makes so sense to me. Thanks again!