I have a question regarding your implementation, specifically the way you pass raw data to the backbone.
If the input is a video with 36 frames, how are they being processed by a ResNet-50/101?
In the BackboneBase class, forward method, you are passing the tensor list to a backbone, but the backbone is expecting an input of size [64, 3, 7, 7]. As I understand from the paper you are reshaping the videos to [36x300x540] .. so there should be one more preprocessing step from the videos to the backbone.
Hi @Epiphqny,
I have a question regarding your implementation, specifically the way you pass raw data to the backbone. If the input is a video with 36 frames, how are they being processed by a ResNet-50/101?
In the BackboneBase class, forward method, you are passing the tensor list to a backbone, but the backbone is expecting an input of size [64, 3, 7, 7]. As I understand from the paper you are reshaping the videos to [36x300x540] .. so there should be one more preprocessing step from the videos to the backbone.
Can you shed some light on this extra step?
Thanks!