Scalsol / mega.pytorch

Memory Enhanced Global-Local Aggregation for Video Object Detection, CVPR2020
Other
565 stars 115 forks source link

Sampling of video frames #37

Closed UmarSpa closed 4 years ago

UmarSpa commented 4 years ago

The ImageSet file (VID_train_15frames) that you use to train the MEGA model. It contains 15 frames uniformly sampled from each video. Can you please give any insight on why do you do this ?

I can't find anything in the paper, regarding this.

Thanks.

Scalsol commented 4 years ago

Hi, we just follow the frame sampling strategy as it is in FGFA. Nearly all methods in the video object detection area follow this manner for a fair comparison. In my opinion, this sampling strategy could alleviate the class imbalance problem to some extent. Short video and long video are treated equally. BTW, I have also tried other sampling strategy (e.g. VID_train_every_10_frames) to train the network but obtain nearly the same performance. There may exists a better sampling strategy but I haven't find that. So I choose the widely used one.