Open cw1091293482 opened 5 years ago
Hi, some suggestions for locating the problem are provided as follows:
Check the data preparation, whether the image and text are aligned for training, check the data before and after doing TFRecord. See line 314-317 in train_models.py to check whether the loaded data are correct for training.
Check whether the pretrained ResNet-152 are loaded for training. So far I have only finished the experiments with pretrained ResNet-152, and It may be hard to converge without pretraining.
Check the loss during training, if the loss does not decrease, it might due to the wrong data preparation and pretrainining.
Check the test results during training, e.g. test on step=20k, 30, 50k.
I got the same results as cw1091293482. Did you finally get the better results close to the paper's ones after applying the author's suggestions?
Hi,
I re-run the codes as your instructions, however, I cannot get some better results which are close to the results presented in paper. The results from my repeating the code are below:
Evaluating with Cosine Distance... Text-to-Image Evaluation... Recall@1 0.14% Recall@5 0.64% Recall@10 1.34% MAP 0.14% Image-to-Text Evaluation... Recall@1 0.10% Recall@5 1.00% Recall@10 1.80%
The results are conduct on MS-COCO dataset. Does anyone also have these similar results?
Best