Closed digbose92 closed 5 years ago
For ResNetCRNN, it uses frames from 1 to 29 as input.
Thank you @digbose92. @lee-man is right, I used fixed frame size 29 (frame number: 1~29). Sorry that I didn't make it very clear, where it was described in the repo: The minimal frame number 28 is the consensus of all videos in UCF101. However, I do have the code for variable length with ResNetCRNN, please give me 1 week or so to organize the code.
Hi @HHTseng, so for all the videos in UCF101, 29 is the minimum number of frames. So each batch has a size of (batch size, 29, embedding size) if batch first is used? And when the batches are created for different videos, only the first 29 frames are used ? For videos having more than 29 frames, then there is truncation. The variables begin_frame, end_frame and skip_frame are not updated.
Yes, @digbose92. All of them are correct as you mentioned: (1) True: (batch size, 29, embedding size) if batch_first is used (2)Yes: And when the batches are created for different videos, only the first 29 frames are used ? (3) True: For videos having more than 29 frames, then there is truncation. (4) True: variables begin_frame, end_frame and skip_frame are not updated.
Hi @HHTseng thanks for the clarification.
No problem, @digbose92! Thanks for your questions too.
Hi @HHTseng Can we have the code for the videos with variable length/number of frames?
@digbose92 Did you manage to handle the frames of varying length?
Since the videos are composed of different number of frames, the inputs to the LSTM from the CNN encoder network will be of different lengths within the same batch. How is the network handling variable lengths of the videos ? There is no mention about padding anywhere in the code.