Open 123liluky opened 5 years ago
The features are computed with a stride. The deeper the layer the bigger that stride is. One possibility is to combine features from different layers by upsampling the deeper ones (skip connections). You can also interpolate features between frames.
Joao
On Wed, Jun 26, 2019, 8:10 AM 123liluky notifications@github.com wrote:
If n_frames represents the frames of a video, How can I extract features of size n_frames*1024 with rgb_videos as the input?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/kinetics-i3d/issues/66?email_source=notifications&email_token=ADXKU2SMO7BFWBDCQ4TF3STP4MI6VA5CNFSM4H3PE662YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3XHY7Q, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXKU2ULCTJNCHYB5GUBU53P4MI6VANCNFSM4H3PE66Q .
If n_frames represents the frames of a video, How can I extract features of size n_frames*1024 with rgb_videos as the input?