dorarad / gansformer

Generative Adversarial Transformers
MIT License
1.33k stars 149 forks source link

Seems dlatents_in variable in G_synthesis are not updated correctly? #14

Closed AndrewChiyz closed 3 years ago

AndrewChiyz commented 3 years ago

Hi, Thank you for sharing the code!

I am reading the code in network.py recently. It seems the dlatents_in variables for different layers with different resolutions are not updated correctly.

In line 1199, it shows the dlatents_in is a [Bs, latents_num, num_layers, dlatent_size] tensor, where latents_num = k local region latent component + 1 global latent component. https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1199

In line 1420~1422, the dlatents in the scale of "4x4" is updated. https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1420-L1422

Since the dlatents is initialized as None at the beginning (line 1394 ), the global and local latent codes are extracted by using the layer_idx in the layer function. (See line 1256, and line 1260~1261) https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1394 https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1256 https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1260-L1261

However, for the resolution 8x8, the updated dlatents variable from 4x4 is fed into the block, and then fed into two layer structures. Since the dlatents is not None, the k local region latent code will not be extracted by using the layer_idx for the 8x8 layers. It seems the updated dlatents variable from "4x4" layer will be consistently injected and updated by stacking block structures. For other different scales of layers, the other dlatents variables in the dlatent_in tensor will not be extracted and updated.

https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1429-L1433

https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1337-L1345

Is that a bug or is that a special setting in your model? Sorry for writing such a long question.

Thanks a lot! :)

dorarad commented 3 years ago

Hi, thanks for reaching out! :) I hope to get back to you in about 2 days at most, will go then over all open issues!

dorarad commented 3 years ago

Thanks very much for the question! This is not a bug but in fact the intended case!

We have two modes, the simple and the duplex, in the simplex case, we simply extract the latents from each scale as you suggest, and only update the latents in the case of duplex here (by updating new_dlatents) https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1316

Indeed, in the duplex case we update the dlatents that were extracted from the 4x4 and then keep refeeding and reupdating them to the next scales iteratively, rather than taking the original latents from the other resolutions.

However, note that in practice, the latents for all the scales actually have the very same values, as you can see here: https://github.com/dorarad/gansformer/blob/556bbdde59b17abeadd55d61da80050828f04c4b/training/network.py#L1092-L1094 And therefore it's not an issue that we don't rely on the latents for from other scales in the duplex-mode, because they are anyway have the same value as the latents for 4x4.

The only exception to that is the case of the layer mixing https://github.com/dorarad/gansformer/blob/main/training/network.py#L930, which is a non-essential feature that increases disentanglement but not FID scores. This implementation form where the same latent vector is broadcasted through tiling to all the scales, which might lead one to infer that different scales may have different latents vectors, is a remnant from the original styleGAN implementation, that was meant to support the style-mixing.

Hope it helps, please let me know if you have any further questions! :)

AndrewChiyz commented 3 years ago

Thank you very much for the detailed clarification! :)