ajseo95 / MASN-pytorch

pytorch implementation for the paper Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
MIT License
2 stars 1 forks source link

about extract 2d global feature with resnet152 as backbone #10

Open YukiFan opened 1 year ago

YukiFan commented 1 year ago

thanks for your excellent work, and i have one question about the extracting 2d appearance feature. when using the resnet152 as the backbone, the output of layer4(before avg_pooling) is [frame 2048 7 7], frames refer to the length of the clip. then stack clips, i get [T len 2048 7 7.] So can you share how you handle the resnet152 and get the appearance feature claimed in the paper that the dim is T*d thanks very much