I want to ask whether you remember how you implement 32x2 implementation? In other words, does the utilization of 32 frames mean that 64f is given to the architecture and the first conv layer has a temporal stride of two, therefore the architecture is 32x2? Or do you select 32 frames with a stride of two in the dataloader, and feed 32f to the architecture.
I selected 32 contiguous frames (stride 1) for evaluation. I didn't try with 64 frames (and stride 2). I also didn't train the models from scratch. The weights are from this repo.
Thank you for your good effort.
I want to ask whether you remember how you implement 32x2 implementation? In other words, does the utilization of 32 frames mean that 64f is given to the architecture and the first conv layer has a temporal stride of two, therefore the architecture is 32x2? Or do you select 32 frames with a stride of two in the dataloader, and feed 32f to the architecture.
Thanx in advance