A problem about evaluation

weiyunfei commented 4 years ago

Hi, Kunpeng. Appreciate for your excellent paper and code. But I have some trouble confusing me when I tried to evaluation models trained with your code. I have trained two models with your code, and I found the my results could not achieve that is stated in your paper. Moreover, I found that the pretrained models you provided can not achieve the results in the paper on MS-COCO either. And in my environment, pytorch=1.3 & python-3.7 are installed. I have compared my training log and yours and found that they are particularly similar, so I don't know what's wrong with my code. The evaluation result on MS-COCO 1K test set is as follow. Would you please give me some proposal? I'll greatly appreciate for that.

KunpengLi1994 commented 4 years ago

Hi,

Thanks for your interest in our work! I think it is because you are using a different environment setting with our configurations.

The code is tested using Pytorch 0.4 and python 2.7, as well as others libs stated in the readme file and requirement.txt. It is not written in python 3. I suggest you to try the correct environment setting.

weiyunfei commented 4 years ago

Hi,

Thanks for your interest in our work! I think it is because you are using a different environment setting with our configurations.

The code is tested using Pytorch 0.4 and python 2.7, as well as others libs stated in the readme file and requirement.txt. It is not written in python 3. I suggest you to try the correct environment setting.

Thanks for your reply! I'll try it today. By the way, could you tell me which version of cuda you used?

weiyunfei commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

weiyunfei commented 4 years ago

Hi, Kunpeng. Sorry for bother again. I got some problems when I try to train a model on the Flickr30K. I'm confused whether I should use the same setting as that on MS-COCO. I used the same set, while I've not get a good result as yours. Could you tell me the settings when you trained a model on Flickr30K dataset, e.g. num of epochs, lr, batch size? I wanna use the same settings as yours to reproduce your result.

LgQu commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

@weiyunfei Hi Yunfei. I also encounter this problem that the performance is lower than those given in the paper. I guess there is something wrong with my modified python3 style code. Would you please offer more details about your problem?

weiyunfei commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

@weiyunfei Hi Yunfei. I also encounter this problem that the performance is lower than those given in the paper. I guess there is something wrong with my modified python3 style code. Would you please offer more details about your problem?

Well, in fact I have not found out the reason. My final environment is python2.7 and pytorch1.2, and I found that only the python version has some affect to it. Therefore, I infer that there may be some differences in the devide operation between py2 and py3, but I have not found where it is. Perhaps you could install an environment like mine. Here is my conda enviroment yaml, you can import it directly. https://drive.google.com/file/d/1hSY9M2MgQK95pw0TPFQi6DG3cXyLyG-v/view?usp=sharing

zl535320706 commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

@weiyunfei Sorry to disturb you, what is your final re-produce results on the MS-COCO dataset? My test results are image-to-text R@1 68.5；text-to-image R@1 57.5. But, single model results from the author (model_coco_1.pth.tar image-to-text R @ 1 74.0; text -to-image R @ 1 60.8; rsum 509.4 ..... model_coco_2.pth.tar image-to-text R @ 1 73.6; text-to-image R @ 1 60.7; rsum 508.3) There is still a large gap, I need to adjust the parameters of the namespace? Or increase the number of training epochs? ? I have already used the same environment as the author, such as python2 pytorch0.4.1 and so on.

@KunpengLi1994

weiyunfei commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

@weiyunfei Sorry to disturb you, what is your final re-produce results on the MS-COCO dataset? My test results are image-to-text R@1 68.5；text-to-image R@1 57.5. But, single model results from the author (model_coco_1.pth.tar image-to-text R @ 1 74.0; text -to-image R @ 1 60.8; rsum 509.4 ..... model_coco_2.pth.tar image-to-text R @ 1 73.6; text-to-image R @ 1 60.7; rsum 508.3) There is still a large gap, I need to adjust the parameters of the namespace? Or increase the number of training epochs? ? I have already used the same environment as the author, such as python2 pytorch0.4.1 and so on.

@KunpengLi1994

Hello, I get the results stated in Kunpeng's paper with the pretrained model he provided. I propose you should check the code and environment carefully. Your results looks similar as the results when I used py3. Or you may try with the environment I provided in this issue.

zl535320706 commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

@weiyunfei Sorry to disturb you, what is your final re-produce results on the MS-COCO dataset? My test results are image-to-text R@1 68.5；text-to-image R@1 57.5. But, single model results from the author (model_coco_1.pth.tar image-to-text R @ 1 74.0; text -to-image R @ 1 60.8; rsum 509.4 ..... model_coco_2.pth.tar image-to-text R @ 1 73.6; text-to-image R @ 1 60.7; rsum 508.3) There is still a large gap, I need to adjust the parameters of the namespace? Or increase the number of training epochs? ? I have already used the same environment as the author, such as python2 pytorch0.4.1 and so on. @KunpengLi1994

Hello, I get the results stated in Kunpeng's paper with the pretrained model he provided. I propose you should check the code and environment carefully. Your results looks similar as the results when I used py3. Or you may try with the environment I provided in this issue.

thank you very much, I will try it.

LgQu commented 4 years ago

I have fixed it. There is something wrong when I trans the code to python3 style. Thanks for your reply again. I will close this issue.

@weiyunfei Hi Yunfei. I also encounter this problem that the performance is lower than those given in the paper. I guess there is something wrong with my modified python3 style code. Would you please offer more details about your problem?

Well, in fact I have not found out the reason. My final environment is python2.7 and pytorch1.2, and I found that only the python version has some affect to it. Therefore, I infer that there may be some differences in the devide operation between py2 and py3, but I have not found where it is. Perhaps you could install an environment like mine. Here is my conda enviroment yaml, you can import it directly. https://drive.google.com/file/d/1hSY9M2MgQK95pw0TPFQi6DG3cXyLyG-v/view?usp=sharing

Thanks a lot. I have obtained the same results in the paper with the pretrained model. By the way, the version of my pytorch is 1.0.1 and python is 2.7. Therefore, the previous poor results are caused by my python version(3.5).

hh23333 commented 4 years ago

Hi, Kunpeng. Sorry for bother again. I got some problems when I try to train a model on the Flickr30K. I'm confused whether I should use the same setting as that on MS-COCO. I used the same set, while I've not get a good result as yours. Could you tell me the settings when you trained a model on Flickr30K dataset, e.g. num of epochs, lr, batch size? I wanna use the same settings as yours to reproduce your result.

@weiyunfei Hi Yunfei. I also encounter this problem that the performance on the Flickr30K is much lower than those given in the paper (61 rank1 for text retrieval and 48rank1 for image retrieval). Did you get a similiar results as mine? Have you ever solved this problem?

KunpengLi1994 commented 4 years ago

Hi, Kunpeng. Sorry for bother again. I got some problems when I try to train a model on the Flickr30K. I'm confused whether I should use the same setting as that on MS-COCO. I used the same set, while I've not get a good result as yours. Could you tell me the settings when you trained a model on Flickr30K dataset, e.g. num of epochs, lr, batch size? I wanna use the same settings as yours to reproduce your result.

Hi Yunfei,

Sorry for the late reply duo to my due on other projects. Actually, we re-organize our code when preparing the camera-ready version and only re-train models at that time on the MSCOCO where we mainly focus on.

For training on Flickr30K, we usually use a Batch Normalization layer (refer to this line) to make the training more stable on this small dataset. I have updated the "model.py" code and provide the new pretrained models at here. It should achieve 71.5 for R@1 of i2t(Image to text) and 54.8 for R@1 of t2i(Text to image).

weiyunfei commented 4 years ago

Hi Yunfei,

Sorry for the late reply duo to my due on other projects. Actually, we re-organize our code when preparing the camera-ready version and only re-train models at that time on the MSCOCO where we mainly focus on.

For training on Flickr30K, we usually use a Batch Normalization layer (refer to this line) to make the training more stable on this small dataset. I have updated the "model.py" code and provide the new pretrained models at here. It should achieve 71.5 for R@1 of i2t(Image to text) and 54.8 for R@1 of t2i(Text to image).

Hi, Kunpeng. Thanks for your reply. I will try this new code again.

KunpengLi1994 / VSRN

A problem about evaluation #11