disanda / Deep-GAN-Encoders

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.
43 stars 10 forks source link

Embedding arbitrary image with BIGGAN encoder #3

Closed spiegelball closed 2 years ago

spiegelball commented 2 years ago

After reading the paper and playing around with the code, I am wondering if it is possible, to encode arbitrary images with the BIGGAN encoder. If I read the code right, you need to feed the conditional vector into the encoder too https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/inferE.py#L134

But you only get the conditional vector by inference from noise and class vector with BIGGAN. So how can one encode images where this conditional vector is not present?

Edit: Since const1 is the internal conditional vector of the biggan, which is computed from the latent vector z and the class vector at https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/model/biggan_generator.py#L300 and returned from the model at https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/inferE.py#L129 you basically feed the target latent vector into the encoder. This seems wrong. Do I overlook something?

disanda commented 2 years ago

BigGAN needs z (latent vector) and const1 (label/class vector) to generate an image with a specific class. Here, E outputs imitated z' and const2.

Here, we use const1 and z' to generate the image reconstruction.

So, no const (label/class vector), we cannot generate images in BigGANs. Getting a label (class vector) is the first step for BigGAN generation.

spiegelball commented 2 years ago

But in your example const1 and const2 not only contain the class vector, but also the noise vector z, as it is constructed in the forward pass of the generator (from line 300 on the cond_vector contains the latent code z) https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/model/biggan_generator.py#L300-L303

Later you feed this tensor into the encoder https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/inferE.py#L134

This means you not only provide the encoder with the class vector, but also with the noise vector z you want to reconstruct. I am in doubt if this is correct.

disanda commented 2 years ago

Sorry late reply as my busy work life,

I think you can refer https://github.com/disanda/DSE/blob/master/model/E/E_BIG.py to know the BigGAN encoder.

We add layer-wise cond_vector which put in conditional batch norm. That is same as BigGAN block for layer-wise cond_vector.

Here const is cond_vector in BigGAN inversion.

spiegelball commented 2 years ago

Thanks for replying! :) But the problem is, that const is not only cond_vector from BigGAN inversion, but const also contains the latent vector z (as you can see in line 302). So you provide too much knowledge to the encoder (it should reconstruct z without getting it as a input in any way).

disanda commented 2 years ago

Alhough this work has been done a long time ago and I almost forget some detials, there are some motivation as share you.

We bould each encode layer by imitating each GAN generator layer, first as PGGAN and StyleGANs. So StyleGAN layer-wise input style latent code, our encoder layer-wise output the style latent coder too.

And finnally we try the BigGAN, as you can see BigGAN first block, we also imitate it. Do generator layer input as encoder layer output (see https://github.com/disanda/DSE/blob/master/model/biggan_generator.py, line 295-303).

So the too much knowledge was first in BigGAN generator. And we just imiated it and did the inversion task.