facebookresearch / mvit

Code Release for MViTv2 on Image Recognition.
Apache License 2.0
372 stars 43 forks source link

MViTv2 on UCF101 and HMDB51 #14

Closed lovelyczli closed 1 year ago

lovelyczli commented 2 years ago

Thanks for your outstanding work! Both Mvit (with pooling) and Swin (with window) reduce Network complexity, giving me hope to implement it with my machine. Even if I prefer Mvit with its simpleness, I meet great difficulty with limited GPUs. So could you pls assist in :

  1. efficient speedup strategy. (MViTv2_S is still enormous for me, and I mean some efficient video strategy. if have)
  2. config and models on UCF101 and HMDB51. (Perhaps transformer and new architecture donnot work on these tiny datasets, but these are my last hope. BTW, both fine-tune and scratch are crucial for me I just wanna cry for low accuracy on UCF, but it's much better for the long wait for several months on k400
  3. UCF and HMDB dataloader. (I define the ucf.py and hmdb.py by employing almost the same code in kinetics.py, and more advanced implementation of dataloader is also essential.

There seem to be many issues, and addressing them maybe needs many resources. However, if you have any ideas about any of them, pls contact me at 1009440681@qq.com. Looking forward to your reply. LOL @lyttonhao @haooooooqi

lyttonhao commented 1 year ago

Hi,

  1. We have the smaller model MViTv2_T in the model zoo. You could also explore different network designs (e.g. reducing network depth/width and increase pooling size)
  2. We don't run experiments for UCF101 and HMDB51 datasets, so we don't have the dataloaders.