gjylt / DoubleAttentionNet

PyTorch implementation of Double Attention Net
26 stars 3 forks source link

Application to temporal data #4

Open JohnMBrandt opened 4 years ago

JohnMBrandt commented 4 years ago

I understand that in the original paper, the authors apply the double attention block to video data. From reading the paper, I understand how to apply the double attention block between 2D conv layers, such that higher-level features are weighted and combined with lower-level features.

I can't figure out how this implementation would apply to a 5D temporal input -- Batch, Time, Height, Width, Channels. I understand that the first step, feature gathering, involves a dimension reduction, 1x1 convolutions, softmax, and bilinear pooling. Should the data be reshaped to be (B, H, W, CxT)? That seems to be my inclination from the paper -- "where each b is a dhw-dimensional row vector" -- it seems that the output of the gathering stage is dxhxw size, and doesn't incorporate the input channel size because the conv is 1x1x1.

Thoughts?

Leno-B commented 4 years ago

I can't figure out how should I set the softmax dim?