caojiezhang / VSR-Transformer

PyTorch implementation of VSR-Transformer
252 stars 28 forks source link

How to implement the Spatial-temporal positional encoding? #5

Open Jacklikesironman opened 3 years ago

Jacklikesironman commented 3 years ago

As shown in the Supplementary Materials of the proposed method, the channel dimension of feature after Extractor, which need to be added to position embeding, is 64. But in Subsection 4.1 of the main paper, it's noted that the dimension 'd' should be divisible by 3 since the positional encodings of the three dimensions should be concatenated to form the final 'd' channel positional encodings. However, 64 can't be divisible by 3.

So, how to implement the Spatial-temporal positional encoding? I am looking forward to your reply as soon as possible.