ZhaofanQiu / pseudo-3d-residual-networks

Pseudo-3D Convolutional Residual Networks for Video Representation Learning
MIT License
352 stars 119 forks source link

about the input #15

Open amywyx opened 6 years ago

amywyx commented 6 years ago

hi, according to the paper,the input of the model is a 16-frames clip,but most video is much longer than 16 frames.Do you use the 16-frame clip presenting the whole video clip to train the model?

zzy123abc commented 6 years ago

Network Testing. We evaluate the performance of the learnt P3D ResNet by measuring video/clip classification accuracy on the test set. Specifically, we randomly sample 20 clips from each video and adopt a single center crop per clip, which is propagated through the network to obtain a clip-level prediction score. The video-level score is computed by averaging all the clip-level scores of a video.

XiongChengxin commented 6 years ago

@zzy123abc , Hi, thanks for your reply, do you know what is the meaning of 'adopt a single center crop per clip'? Looking forward to your reply!

zzy123abc commented 6 years ago

I think it's just a simple crop manner.

XiongChengxin commented 6 years ago

@zzy123abc I see, thank you!