gabeur / mmt

Multi-Modal Transformer for Video Retrieval
http://thoth.inrialpes.fr/research/MMT/
Apache License 2.0
259 stars 41 forks source link

About finetuning from a HowTo100M pretrained model on ActivityNet dataset #14

Closed jpthu17 closed 3 years ago

jpthu17 commented 3 years ago

Thank you for your generous sharing of your wonderful research results! But I have some problems in reproducing the results.

I found that using the ‘HowTo100M_full_train.pth’ you provided can not achieve the desired results on ActivityNet dataset. What should I do if I want to finetune on ActivityNet dataset? Is there any chance that you release the ‘HowTo100M_full_train.pth’ for ActivityNet dataset sometime?

Thank you!

gabeur commented 3 years ago

Indeed the ‘HowTo100M_full_train.pth’ I pre-trained after fixing Fix1 and Fix3 does not perform as well as before on ActivityNet as optimal hyperparameters seem to have changed.

I have experimented with a lower batch size (24) and obtained the following results: command: python -m train --config configs_pub/eccv20/ActivityNet_val1_trainval_bs.json --load_checkpoint data/checkpoints/HowTo100M_full_train.pth

ActivityNet_val1_test:
 t2v_metrics/R1/final_eval: 27.191376855806386
 t2v_metrics/R5/final_eval: 59.42647956070775
 t2v_metrics/R10/final_eval: 73.68314012609315
 t2v_metrics/R50/final_eval: 94.30547081553793
 t2v_metrics/MedR/final_eval: 4.0
 t2v_metrics/MeanR/final_eval: 16.801098230628433
 t2v_metrics/geometric_mean_R1-R5-R10/final_eval: 49.19562050180819
 v2t_metrics/R1/final_eval: 27.293064876957494
 v2t_metrics/R5/final_eval: 59.24344112263575
 v2t_metrics/R10/final_eval: 73.3577384584096
 v2t_metrics/R50/final_eval: 93.49196664632906
 v2t_metrics/MedR/final_eval: 4.0
 v2t_metrics/MeanR/final_eval: 18.19646125686394
 v2t_metrics/geometric_mean_R1-R5-R10/final_eval: 49.1337040926001