Open tranxuantuyen opened 2 years ago
Hi @tranxuantuyen,
I am also interested to make use of MViT models on AVA dataset.
I have noticed that there are no AVA pretrained models available slowfast, if you have trained from scratch 30 epochs, that seems to be the reason for low mAP.
Happy to discuss more on this.
@Balakishan77 @tranxuantuyen hi,
I am also trying to reproduce the MViT these days. I first tried to pre-train the MViT from scratch but found the recon. results are poor, especially for the colour.
In this case, I switched to the well-trained model provided by the authors, and tried to make a sanity check on the K400 dataset. The well-trained model I used is MViT-B, i.e., k400_VIT_B_16x4_MAE_PT, which can be found here. Unluckily, the recon. results are still poor, here are some examples:
The input videos are chosen from the validation set of the original K400. However in the paper, the shown recon. results look good.
Could you show some video reconstruction results during the pre-training stage? Many thanks.
Similar dummy outputs https://github.com/facebookresearch/SlowFast/issues/668
Hi, thanks for the code base
I'm trying to reproduce the results on AVA dataset with MViT model. I noticed that while the code is available, the config for finetuning was not provided. I built the config file from the implementation details reported in the paper but only got around 20 mAP. Am I do the right way to reproduce the results?
Any suggestions or discussions are welcome, thank you