BruceW91 / CVSE

The official source code for the paper Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (ECCV 2020)
172 stars 19 forks source link

Train from scratch #2

Closed detectiveli closed 4 years ago

detectiveli commented 4 years ago

Dear contributor,

Very great work and open source.

When I try to train your model from scratch with --resume none without modified anything, I got the following results on Flickr30k with the final best model:

Average i2t Recall: 84.3 Image to text: 69.8 89.6 93.4 1.0 4.2 Average t2i Recall: 75.2 Text to image: 54.9 81.9 88.8 1.0 7.6

Is this the right answer?

Kind Regards,

BruceW91 commented 4 years ago

Hi, you can leave your e-mail and I will send my training .log file to you. You can check it and take it as reference. The CVSE model is somehow sensitive to the hyper-parameters. And you may need to fine-tune them on your own machine slightly.

detectiveli commented 4 years ago

Many thanks for your reply! Is it convenient that you could upload your log file under the /runs directory for the two datasets so that all my mates can access it? Looking forward to it.

BruceW91 commented 4 years ago

Hi, I have updated the repo and uploaded the training .log file of flickr30k dataset, under the directory of CVSE/runs/f30k/CVSE_f30k/log/events.out.tfevents.1594087758.wanghai-host, you can take it for reference.