dorarad / gansformer

Generative Adversarial Transformers
MIT License
1.33k stars 149 forks source link

About the layer ordering in the figure #18

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi,

Thank you for your great work.

I just want to mention the order of layers based on the code seems different than the figure that you shared. image Shouldnt there be modulated convolutional layer before and after attention layer? Based on the code, a synthesizer block consists of conv0_up layer + attention + noise_adding + conv1 + attention + noise_adding + Resnet conv.

So why do you say there is a (up_sampling x 2) and 2 x resnet convolution after attention layer in that figure?

dorarad commented 3 years ago

Thank you so much for pointing that out!

For the modualted convolution, this is an optional part and isn't used by default (e.g. the CLEVR model doesn't use it). Second, we consider the modulation to be part of the attention layer, where it consists both of regional attention (where each latent affects some region) and a global attention (where one additional global latent attends and modulates all the features, to allow for some global visual effects over the image like style or lighting conditions). We will update the paper text to reflect this extension.

For the *2 part, you're totally correct, that's a mistake in the figure. The whole block repeats two times, rather than each operation independently (I meant to validate it vs the code before I'll upload it to the paper on arxiv and haven't get a chance to do it yet because of neurips). I'll update the figure shortly to be consistent with the code and model.