Frame sampling method - Githubissues

Elstuhn commented 2 days ago

Hey, thank you so much for your work! I think it's very good and I seriously enjoyed reading the paper so I want to do some experimentations with it and try to experiment with ways to improve or modify it.

I have read about the sampling method in the paper, that you sample globally (by default 16 frames for training) and I wish to change it such that it only samples f frames from behind the key/target frame with a stride of 2, may I know where the sampling logic/code is?

Is the current sampling method the TrainSampler or VIDBatchSampler in yolox/data/datasets/vid.py?

YuHengsss commented 1 day ago

Yep, please custome this func and pass kwargs to control the sampling method: https://github.com/YuHengsss/YOLOV/blob/5f069b29e201c4e099c7c3827cc6c63823ce3141/yolox/data/datasets/vid.py#L73

Elstuhn commented 1 day ago

Hey, thanks for your quick reply. I've just looked into it, it seems that function passes out a list of lists that has the same length as training iterations and each list contains randomized global frames and the intervals depend on whether the mode is random or uniform.

My goal is to get the last few frames with n intervals in between with respect to key/target frame. An example of a list inside the list given by photo_to_sequence is ['video1_006306.jpg', 'video1_006315.jpg', 'video1_006324.jpg', 'video1_006333.jpg', 'video1_006342.jpg', 'video1_006351.jpg', 'video1_006360.jpg', 'video1_006369.jpg', 'video1_006378.jpg', 'video1_006387.jpg', 'video1_006396.jpg', 'video1_006405.jpg', 'video1_006414.jpg', 'video1_006423.jpg', 'video1_006432.jpg', 'video1_006441.jpg'] and my question is are these 16 elements all for reference frames or is the last element the key/target frame? (I see that during training, the inputs are of (gframe, 3, h, w) so the key/target frame should be part of the 16 elements right?)

If all elements are used for reference frames, is it possible to get information of the key/target frame in the photo_to_sequence function so that I can get t-16 to t frames where t-th frame is the key/target frame? (basically get past 16 frames from the key/target frame)

YuHengsss / YOLOV

Frame sampling method #113