时间聚合时维度如何对齐？

Alibaba-MIIL / STAM

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

Apache License 2.0

219 stars 31 forks source link

时间聚合时维度如何对齐？ #13

Closed unclebuff closed 3 years ago

unclebuff commented 3 years ago

temporal_aggregation.py第39行对x进行reshape时按照我的理解应该是由B,N,C变为nvids, self.clip_length, NC，为何最后一个维度NC还可以与TransformerEncoderLayer中的embed_dim对齐？按理说这里x的输入维度在经过transformer_model.py第179行的embadding后已经变为B,N,C并一直保持到时间聚合模块。这里的代码实在没有看懂，还希望作者如果看到的话能做出一些解答，谢谢

unclebuff commented 3 years ago

另外x.transpose_(1,0)这一步把batchsize放到了第二个位置用作attention的计算是为什么呢？按理说应该是计算clips之间的attention而不是batch之间的吧，求解答

unclebuff commented 3 years ago

仔细阅读了论文和TRANSFORMERENCODERLAYER的文档，我的问题已经被解决了