Closed thb1314 closed 2 years ago
Hi @thb1314, Thanks for your comment, yes you are right the concatenation is not necessary. But this does not affect the performance and allows to have a final representation similar to that of the DeiT code therefore we keep this implementation.
Best,
Hugo
As shown in
https://github.com/facebookresearch/deit/blob/main/cait_models.py#L241
is equivalent to
Suppose
x
is tensor with shape[B,N,C]
, because LayerNorm calculate themean
andstd
of the last dim of the input feature, and the shape ofmean
are[B,N,1]
, which is irrelevant to the dimB
andN
. Therefore,torch.cat
operation seems not necceassay.