epic-kitchens / epic-kitchens-slowfast

Other
28 stars 15 forks source link

How to select correct NUM_FRAMES and SAMPLING_RATE for a different frame rate? #11

Open qwangku opened 2 years ago

qwangku commented 2 years ago

Thanks for sharing this great resources. Just want to confirm if my understanding about NUM_FRAMES=32 and SAMPLING_RATE=2 is correct. I hope to reduce the "frame rate" for my application to see how bad the prediction performance will become, so I found these two parameters in the yaml file.

For NUM_FRAMES=32, does that mean dataloader will pick every 32 frames from the entire video clip if the video was saved as 320 frames? For example, the selected indices will be 1, 32, 64, 96, 128, ... , 320 (Totally 32 frames)

But if I change NUM_FRAMES=16, the data loader will pick every 64 frames, and selected frames will be 1, 64, 128, 192, ... 320 (Totally 16 frames, smaller than 32, but still represent the entire video [1 ~ 320]). Is my understanding correct?

However, what is this SAMPLING_RATE=2 in this setup? Could someone provide some guidance on this?

iranroman commented 1 year ago

Hello,

If I've understood the code correctly, at training and validation time, NUM_FRAMES will be the number of consecutive frames that will be taken from a video clip. If SAMPLING_RATE>1, then the actual total number of frames will be NUM_FRAMES*SAMPLING_RATE.

So, as an example, imagine you have a video clip that is 320 frames total. If NUM_FRAMES=32 and SAMPLING_RATE=2, during training and validation the data loader will:

  1. randomly select an index between 0 and 320-(NUM_FRAMES*SAMPLING_RATE) as the start frame, let's say 103 in this example.
  2. select SAMPLING_RATE*NUM_FRAMES starting at frame 103
  3. ~You would get frames 103,104,105,...,166~

Note: all of this assumes that the original video clip had a sampling rate of 60fps.

Maybe someone from the EK team can confirm? @ekazakos

iranroman commented 1 year ago

actually, sorry, but I forgot one step:

  1. In the example I described, you will actually get NUM_FRAMES uniformly spaced between frame indices 103 and 166.

I hope this helps (and that I'm actually not wrong somehow).

ekazakos commented 1 year ago

Hi,

@iranroman's interpretation is perfectly correct. The dataloader samples a chunk of NUM_FRAMES*SAMPLING_RATE consecutive frames from which NUM_FRAMES equidistant frames are fed into the model sampled with SAMPLING_RATE.

deep-saket commented 1 year ago

actually, sorry, but I forgot one step:

  1. In the example I described, you will actually get NUM_FRAMES uniformly spaced between frame indices 103 and 166.

I hope this helps (and that I'm actually not wrong somehow).

TH

Hello,

If I've understood the code correctly, at training and validation time, NUM_FRAMES will be the number of consecutive frames that will be taken from a video clip. If SAMPLING_RATE>1, then the actual total number of frames will be NUM_FRAMES*SAMPLING_RATE.

So, as an example, imagine you have a video clip that is 320 frames total. If NUM_FRAMES=32 and SAMPLING_RATE=2, during training and validation the data loader will:

  1. randomly select an index between 0 and 320-(NUM_FRAMES*SAMPLING_RATE) as the start frame, let's say 103 in this example.
  2. select SAMPLING_RATE*NUM_FRAMES starting at frame 103
  3. ~You would get frames 103,104,105,...,166~

Note: all of this assumes that the original video clip had a sampling rate of 60fps.

Maybe someone from the EK team can confirm? @ekazakos

Hello,

Thanks for the explanation, I'm looking for running inferences using this model, do you have a script for inference only? I would deeply appreciate if you can help me out with this.