RaivoKoot / Video-Dataset-Loading-Pytorch

Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.
BSD 2-Clause "Simplified" License
451 stars 43 forks source link

Why subtract 'frames_per_segment' to calculate 'segment_duration' ? #12

Closed Gateway2745 closed 3 years ago

Gateway2745 commented 3 years ago

Hi. Why do you subtract 'frames_per_segment' from 'num_frames' and then divide by 'num_segments' to calculate 'segment_duration' ? Can we not directly divide 'num_frames' by 'num_segments' to get the 'segment_duration' ? Thanks!

https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch/blob/97b54d804adfbf0baf8537bfe525f5d36469abd9/video_dataset.py#L155

RaivoKoot commented 3 years ago

Hi. Good question. So, the equation is: segment_duration = (record.num_frames - self.frames_per_segment + 1) // self.num_segments If you don't subtract frames_per_segment and add 1, then an IndexOutOfBounds error can occur later.

In the case where frames_per_segment=1, - self.frames_per_segment + 1 obviously makes no difference, because we are subtracting 1 and adding 1. So, this only matters in the case where frames_per_segment > 1.

Some Context

When you use frames_per_segment > 1, what happens is that for each segment, a random start index is sampled, and then starting from each start_index, frames_per_segment consecutive frames are loaded and returned. This function _sample_indices does not return the indices of all frames to be loaded, but only the start index of each segment's frames_per_segment frames.

An Example

num_segments = 3 frames_per_segment = 2 num_frames = 6 frame_indices = [0, 1, 2, 3, 4, 5]

We can not use 5 as a start index. This is because starting from 5 we would need to take frames_per_segment=2 frames which would be the two frames at index 5 and 6. Index 6 is out of bounds though. If you do not do - self.frames_per_segment + 1, then the function will sometimes return index 5 as the start index for the third segment. Doing - self.frames_per_segment + 1 is not a perfect solution, but it works.

Gateway2745 commented 3 years ago

Thank you for the detailed explanation! I understood the purpose of it now but my doubt persists.

In the example you have given, the segments would be [0,1], [2,3] and [4,5]. So, the starting index can be either 0, 2 or 4. However, the result of segment_duration = (record.num_frames - self.frames_per_segment + 1) // self.num_segments gives segment_duration=1. Should this not be 2 (each segment [0,1], [2,3],[4,5] has 2 frames) ?

Let's say segment_duration=1. Now, _sample_indices always returns offsets=[0,1,2]. With frames_per_segment = 2, this spans frames [0,1,2,3]. So frames 4 and 5 will never be used. Is this supposed to be how it works?

Thanks again!

RaivoKoot commented 3 years ago

Yes, you are right that this is how it works, even though it is not the best behavior. Because most people only use a single frame per segment, I did not pay much attention to improving this behavior, when I adapted this repostiroy from the original code repository, which implemented this sub-optimal behavior. In my own experiments, I also only ever use a single frame per segment.

However, you are very welcome to create a pull request and suggest an improvement for this. If it is suitable, I am happy to merge it!!

Gateway2745 commented 3 years ago

Hi. Thanks for the confirmation! Oh I see, I was not aware that the norm was to use only a single frame per segment. Sure, although it is difficult for me right now, if I come up with a way to improve the current strategy, I'll definitely raise a PR. Thanks again.