Closed UmarSpa closed 4 years ago
Hi, we just follow the frame sampling strategy as it is in FGFA. Nearly all methods in the video object detection area follow this manner for a fair comparison. In my opinion, this sampling strategy could alleviate the class imbalance problem to some extent. Short video and long video are treated equally. BTW, I have also tried other sampling strategy (e.g. VID_train_every_10_frames) to train the network but obtain nearly the same performance. There may exists a better sampling strategy but I haven't find that. So I choose the widely used one.
The ImageSet file (VID_train_15frames) that you use to train the MEGA model. It contains 15 frames uniformly sampled from each video. Can you please give any insight on why do you do this ?
I can't find anything in the paper, regarding this.
Thanks.