Closed kishorepv closed 3 years ago
Hi,
This is not true. I think the confusion starts from this line: https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/decoder.py#L57 clip_size
is not the video size but the number of frames that we want to give as input to the model. If you have a look at: https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/frame_loader.py#L35 you will be able to see that this is the clip_size
. We want a clip_size of num_samples * sampling_rate
frames, which is 64 in our case, and fps / target_fps
is there to reduce the clip_size when fps is smaller than target_fps, as we have videos with 60 and others with 50 fps. We set target_fps=60
.
Thank you for clarifying. So, in the code num_samples
frames are sampled from num_samples * sampling_rate
frames extracted from the video starting at start_frame
frame (i.e starting from the beginning of the clip). Right?
You are correct in the first part. num_samples
frames are indeed sampled from num_samples * sampling_rate
frames. But as you can see in https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/decoder.py#L30 start_idx could take random values from 0 to delta = max(video_size - clip_size, 0)
during training. Then in https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/frame_loader.py#L7 we add start_frame
in the sampled frame numbers since we are loading frames from the untrimmed video.
Thanks!
In the dataset, instead of extracting the frames from [start_frame, stop_frame] (in EPIC_100_train.csv), you just use the start_frame. So you sample a clip starting at the start_frame and ends at the end of the video. Is this how you intended to do it or is it a bug?