epic-kitchens / epic-kitchens-slowfast

Other
28 stars 15 forks source link

Extraction of video clip #4

Closed kishorepv closed 3 years ago

kishorepv commented 3 years ago

In the dataset, instead of extracting the frames from [start_frame, stop_frame] (in EPIC_100_train.csv), you just use the start_frame. So you sample a clip starting at the start_frame and ends at the end of the video. Is this how you intended to do it or is it a bug?

ekazakos commented 3 years ago

Hi,

This is not true. I think the confusion starts from this line: https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/decoder.py#L57 clip_size is not the video size but the number of frames that we want to give as input to the model. If you have a look at: https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/frame_loader.py#L35 you will be able to see that this is the clip_size. We want a clip_size of num_samples * sampling_rate frames, which is 64 in our case, and fps / target_fps is there to reduce the clip_size when fps is smaller than target_fps, as we have videos with 60 and others with 50 fps. We set target_fps=60.

kishorepv commented 3 years ago

Thank you for clarifying. So, in the code num_samples frames are sampled from num_samples * sampling_rate frames extracted from the video starting at start_frame frame (i.e starting from the beginning of the clip). Right?

ekazakos commented 3 years ago

You are correct in the first part. num_samples frames are indeed sampled from num_samples * sampling_rate frames. But as you can see in https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/decoder.py#L30 start_idx could take random values from 0 to delta = max(video_size - clip_size, 0) during training. Then in https://github.com/epic-kitchens/epic-kitchens-slowfast/blob/217c1d1e3768cd50f10f128189689230da8e2e23/slowfast/datasets/frame_loader.py#L7 we add start_frame in the sampled frame numbers since we are loading frames from the untrimmed video.

kishorepv commented 3 years ago

Thanks!