SwinTransformer / Video-Swin-Transformer

This is an official implementation for "Video Swin Transformers".
https://arxiv.org/abs/2106.13230
Apache License 2.0
1.4k stars 196 forks source link

Abour factorized spatiotemporal model as in Table 4 #46

Open youthHan opened 2 years ago

youthHan commented 2 years ago

Thank you for your work and the codes.

In addition to your released model and weights, I'm wondering if you can also release the model and pretrained weights for factorized spatiotemporal attention (Video-Swin-T), as discussed in Table 4 in your paper.

taoyang1122 commented 2 years ago

Hi, I am also interested in the factorized model in Table 4. Specifically, for the temporal attention in this model, is it just a randomly initialized regular self-attention? And only the spatial attention is shifted-window attention and is initialized with pre-trained weights? Thanks!