YingZhangDUT / Cross-Modal-Projection-Learning

TensorFlow Implementation of Deep Cross-Modal Projection Learning
MIT License
94 stars 20 forks source link

Could not get the results in paper on MS COCO dataset #6

Open cw1091293482 opened 5 years ago

cw1091293482 commented 5 years ago

Hi,

I re-run the codes as your instructions, however, I cannot get some better results which are close to the results presented in paper. The results from my repeating the code are below:

Evaluating with Cosine Distance... Text-to-Image Evaluation... Recall@1 0.14% Recall@5 0.64% Recall@10 1.34% MAP 0.14% Image-to-Text Evaluation... Recall@1 0.10% Recall@5 1.00% Recall@10 1.80%

The results are conduct on MS-COCO dataset. Does anyone also have these similar results?

Best

YingZhangDUT commented 5 years ago

Hi, some suggestions for locating the problem are provided as follows:

  1. Check the data preparation, whether the image and text are aligned for training, check the data before and after doing TFRecord. See line 314-317 in train_models.py to check whether the loaded data are correct for training.

  2. Check whether the pretrained ResNet-152 are loaded for training. So far I have only finished the experiments with pretrained ResNet-152, and It may be hard to converge without pretraining.

  3. Check the loss during training, if the loss does not decrease, it might due to the wrong data preparation and pretrainining.

  4. Check the test results during training, e.g. test on step=20k, 30, 50k.

pengzhanguestc commented 4 years ago

I got the same results as cw1091293482. Did you finally get the better results close to the paper's ones after applying the author's suggestions?