facebookresearch / AVT

Code release for ICCV 2021 paper "Anticipative Video Transformer"
Apache License 2.0
152 stars 28 forks source link

The input/output feature dimensions of Transformer Encoder and Causal Transformer Decoder? #41

Open yxgz opened 2 years ago

yxgz commented 2 years ago

Hi, thanks for your great project! I am wondering the input/output feature dimensions of Transformer Encoder. The description in Section 4.1 of the paper shows the input/output feature dimensions are both 768D, is it right? However, the description in Section 4.4 of the paper shows the input feature dimension of Causal Transformer Decoder is 2048D, what is the output feature dimension of Causal Transformer Decoder? And is there a dimension conversion (768D->2048D) before using Causal Transformer Decoder?

rohitgirdhar commented 2 years ago

Hi, thanks for your interest and apologies for the delay. You are correct, there is a linear layer that does this mapping -- https://github.com/facebookresearch/AVT/blob/b082c99a6b3104780237022274db74f5c3124cc5/models/base_model.py#L46