Closed spiegelball closed 2 years ago
BigGAN needs z (latent vector) and const1 (label/class vector) to generate an image with a specific class. Here, E outputs imitated z' and const2.
Here, we use const1 and z' to generate the image reconstruction.
So, no const (label/class vector), we cannot generate images in BigGANs. Getting a label (class vector) is the first step for BigGAN generation.
But in your example const1 and const2 not only contain the class vector, but also the noise vector z, as it is constructed in the forward pass of the generator (from line 300 on the cond_vector contains the latent code z) https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/model/biggan_generator.py#L300-L303
Later you feed this tensor into the encoder https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/inferE.py#L134
This means you not only provide the encoder with the class vector, but also with the noise vector z you want to reconstruct. I am in doubt if this is correct.
Sorry late reply as my busy work life,
I think you can refer https://github.com/disanda/DSE/blob/master/model/E/E_BIG.py to know the BigGAN encoder.
We add layer-wise cond_vector which put in conditional batch norm. That is same as BigGAN block for layer-wise cond_vector.
Here const is cond_vector in BigGAN inversion.
Thanks for replying! :) But the problem is, that const is not only cond_vector from BigGAN inversion, but const also contains the latent vector z (as you can see in line 302). So you provide too much knowledge to the encoder (it should reconstruct z without getting it as a input in any way).
Alhough this work has been done a long time ago and I almost forget some detials, there are some motivation as share you.
We bould each encode layer by imitating each GAN generator layer, first as PGGAN and StyleGANs. So StyleGAN layer-wise input style latent code, our encoder layer-wise output the style latent coder too.
And finnally we try the BigGAN, as you can see BigGAN first block, we also imitate it. Do generator layer input as encoder layer output (see https://github.com/disanda/DSE/blob/master/model/biggan_generator.py, line 295-303).
So the too much knowledge was first in BigGAN generator. And we just imiated it and did the inversion task.
After reading the paper and playing around with the code, I am wondering if it is possible, to encode arbitrary images with the BIGGAN encoder. If I read the code right, you need to feed the conditional vector into the encoder too https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/inferE.py#L134
But you only get the conditional vector by inference from noise and class vector with BIGGAN. So how can one encode images where this conditional vector is not present?
Edit: Since const1 is the internal conditional vector of the biggan, which is computed from the latent vector z and the class vector at https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/model/biggan_generator.py#L300 and returned from the model at https://github.com/disanda/DSE/blob/e56a63d7f0c912799ff9a2a15a095239108a847b/inferE.py#L129 you basically feed the target latent vector into the encoder. This seems wrong. Do I overlook something?