eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.19k stars 570 forks source link

How to use generator to generate random image? #12

Closed zerollzeng closed 3 years ago

zerollzeng commented 3 years ago

Great job! @eladrich. I was wondering how to use you Generator implementation to generate some random images. I found the forward code in models/stylegan2/model.py and it looks like:

    def forward(
            self,
            styles,
            return_latents=False,
            return_features=False,
            inject_index=None,
            truncation=1,
            truncation_latent=None,
            input_is_latent=False,
            noise=None,
            randomize_noise=True,
    ):
        if not input_is_latent:
            styles = [self.style(s) for s in styles]

        if noise is None:
            if randomize_noise:
                noise = [None] * self.num_layers
            else:
                noise = [
                    getattr(self.noises, f'noise_{i}') for i in range(self.num_layers)
                ]

I am a little about the argument styles for? can I generate a noise via make_noise() and then generate a random face image?

Best regard!

yuval-alaluf commented 3 years ago

Hi @zerollzeng , First, please note that the implementation for the generator was taken from rosinality.

With that said, in our repo we provide code that you can adopt for generating random samples using our pSp network and StyleGAN generator. Specifically, please refer to our inference.py script for an implementation for generating a random latent vector that we use for style mixing here: https://github.com/eladrich/pixel2style2pixel/blob/ac7da535f0dfb88a55666baeba8f0d83827db5ec/scripts/inference.py#L117-L121

Although here we only store latent_to_inject, you can make a few changes that will store the generated corresponding random face image for the random latent vector.

For example, the following code should be a good starting point for your needs:

n_images_to_generate = 10
generated_images = []
for _ in range(n_images_to_generate):
        random_vec = np.random.randn(1, 512).astype('float32')
        random_image, _ = net(torch.from_numpy(random_vec).to("cuda"), input_code=True, return_latents=True)
        generated_images.append(random_image)

Here, we generate random w vectors which are fed into pSp to randomly generate face images of size 256x256.

Please let me know if this is clear and if you have further questions.

zerollzeng commented 3 years ago

Oh thanks for your detailed explanation, I understand it now ::smile:: I'm reading your paper and code, doing some experiment, your jobs is fantastic! I think it would be great if we keep this issue open so that we can continue the discussion instead of open a new issue :+1:

zerollzeng commented 3 years ago

I'm going to train your encoder network from scratch, here are some question I'm interested:

  1. What's the difference between GradualStyleEncoder, BackboneEncoderUsingLastLayerIntoW, and BackboneEncoderUsingLastLayerIntoWPlus?
  2. Does the num_layer's (50, 100, 152) affect the reconstruct result a lot?
yuval-alaluf commented 3 years ago

Thank you for the kind words!
Regarding your questions:

  1. These architectures refer to official pSp encoder, the Naive W+ encoder, and W Encoder mentioned in the paper, respectively. The pSp architecture is explained in Section 3 in the paper and is illustrated in Figure 2. For an explanation of the two later architectures, please refer to the results shown in Section 4.1 (StyleGAN Inversion) in the paper where we discuss the ablation study performed.
  2. In our experiments, we did not change the num_layers from the default value of 50. That is, in all experiments, we use a pretrained IR-SE50 model (which we linked in the README).

Please let me know if you have further questions.

yuval-alaluf commented 3 years ago

Closing due to inactivity.