I have a few questions about your training process.
(1) Did you fix the number of frames (clips) as 8? Does it impose that any number bigger or smaller than 8 doesn't perform as well as 8?
(2) In the training step, do you shuffle the order of frames (clips)? I have a feeling that it is not proper to shuffle the frames because the frame-related attention parts learn the order of frames too? But when instantiating DataLoader in your 'train.py', you set shuffle value as True. So I am wondering if you intentionally shuffled the frames and if it leads to better training.
Hi, thank you first for your implementation.
I have a few questions about your training process. (1) Did you fix the number of frames (clips) as 8? Does it impose that any number bigger or smaller than 8 doesn't perform as well as 8?
(2) In the training step, do you shuffle the order of frames (clips)? I have a feeling that it is not proper to shuffle the frames because the frame-related attention parts learn the order of frames too? But when instantiating DataLoader in your 'train.py', you set shuffle value as True. So I am wondering if you intentionally shuffled the frames and if it leads to better training.
Thank you again :)