SJTU-LuHe / TransVOD

The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
Apache License 2.0
203 stars 28 forks source link

Window size of reference frames #41

Open SartisticV opened 8 months ago

SartisticV commented 8 months ago

Hi! I have a question regarding the code. Why is the decision made to sample from all video frames when the number of reference is greater than 10? I cant seem to find it in the paper.

https://github.com/SJTU-LuHe/TransVOD/blob/5a4464084b166e40680b8a071d9756f847876acc/datasets/vid_multi.py#L75-L76

SartisticV commented 8 months ago

In addition, why is the sampling strategy different during evaluation?