Open kpmokpmo opened 3 years ago
Hi, thanks for you interest.
In addition, I notice that they only applied normalization on the input data. Our work demonstrated that this operation can be generalized to latent space.
Thank you for quick reply! Well, I still want to double check about the design: if the attention block has the following structure:
S+T norm & concat+conv x = x + self.drop_path(self.attn(self.norm1(x))) x = x + self.drop_path(self.mlp(self.norm2(x)))
I think at least self.norm1 plays a duplicated role as the S/T norm layer. Please correct me if I shouldn't insert the ST norm here at all. Many thanks.
Hi, thanks for your work.
Just several quick questions here:
Thank you very much!