Sense-X / UniFormer

[ICLR2022] official implementation of UniFormer
Apache License 2.0
823 stars 111 forks source link

pretrain #41

Closed JinglUO703 closed 2 years ago

JinglUO703 commented 2 years ago

How does the video transformer pre train on image1k? Isn't the input different? for example : 3D patch-embedding in video, 2D patch-embedding in image?

Andy1621 commented 2 years ago

Thanks for your question. YES, the input is different, we simply inflate the 2D kernels to 3D for video input. The code is as follows: https://github.com/Sense-X/UniFormer/blob/e8024703bffb89cb7c7d09e0d774a0d2a9f96c25/video_classification/slowfast/models/uniformer.py#L387-L421

Andy1621 commented 2 years ago

As there is no more activity, I am closing the issue, don't hesitate to reopen it if necessary.