cshizhe / hgr_v2t

Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".
MIT License
209 stars 21 forks source link

Can not find "word_embeds.glove42b.th" #7

Closed KunpengLi1994 closed 4 years ago

KunpengLi1994 commented 4 years ago

Hi Shizhe,

Thanks for your great work! I noticed in the training script, it needs to load a pre-train model:

--resume_file $resdir/../../word_embeds.glove42b.th

This leads to initialize the text embedding module?

Besides, I can not find this file from "MSRVTT/results/RET.released/" and can only find one "MSRVTT/results/RET/word_embeds.glove32b.th". Is there any difference between word_embeds.glove42b.th and word_embeds.glove32b.th? Could you please share the "word_embeds.glove42b.th"?

KunpengLi1994 commented 4 years ago

By using "MSRVTT/results/RET/word_embeds.glove32b.th" as --resume_file, I obtain following results. The results on MSRVTT are much lower than those in the paper, especially for R@1 for video-to-text retrieval. Could please help on this issue. Thanks!

image

KunpengLi1994 commented 4 years ago

Full output log is as follows:

image

cshizhe commented 4 years ago

Hi Shizhe,

Thanks for your great work! I noticed in the training script, it needs to load a pre-train model:

--resume_file $resdir/../../word_embeds.glove42b.th

This leads to initialize the text embedding module?

Besides, I can not find this file from "MSRVTT/results/RET.released/" and can only find one "MSRVTT/results/RET/word_embeds.glove32b.th". Is there any difference between word_embeds.glove42b.th and word_embeds.glove32b.th? Could you please share the "word_embeds.glove42b.th"?

Sorry, this is a typo. This file is the glove42b embedding for word initialization.

According to my experiments, it is normal to have performance variances with ±2 points on the rsum metric. I am not sure why the performance you reproduced was lower than expected. I uploaded my configuration files and a trained model in this link (code: f2hm) on MSRVTT dataset. The performance is as follows:

ir1,ir5,ir10,imedr,imeanr,cr1,cr5,cr10,cmedr,cmeanr,rsum,geoavgt2v
ir1-rsum-geoavgt2v,epoch.22.th,9.00,25.93,37.01,22.00,150.65,14.58,37.29,49.30,11.00,88.58,173.12,20.52

You could compare with the configurations of this model. Hope it helps.

KunpengLi1994 commented 4 years ago

Got it. Thanks for the reply and the pre-trained model. I have checked the configurations are the same. By training one more time, I could be able to obtain better results.

number of resumed variables: 27 ir1,ir5,ir10,imedr,imeanr,cr1,cr5,cr10,cmedr,cmeanr,rsum cr5-cmedr-rsum,epoch.21.th,8.88,25.98,36.65,22.00,148.33,14.28,36.62,48.73,11.00,87.82,171.14

Therefore, it seems need to run multiple times for training to obtain the best results.