232525 / PureT

Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
67 stars 13 forks source link

Scheduled Sampling #19

Open shreyassks opened 1 year ago

shreyassks commented 1 year ago

I notice scheduled sampling is being used to reduce exposure bias. Although the ss_prob is modified in the training loop, I don't find the code on how this ss_prob is actually helping out in sampling the sequence from generated words during training in the forward function of the PureT model. Can you please let me know if I'm missing something?

232525 commented 1 year ago

Sorry to confuse you, but actually we have not used scheduled sampling. Our code is based on the repo of JDAI-CV/image-captioning. You can find more detail about how ss work at https://github.com/JDAI-CV/image-captioning/blob/4aa80d26cc7b439c8d307c2647b569b9e966db04/models/att_basic_model.py#L106.

shreyassks commented 1 year ago

Okay thanks. I trained this model based on the instructions provided in readme. I trained for 20 epochs under XE loss with metrics for test set {'Bleu_1': 0.5621, 'Bleu_2': 0.359, 'Bleu_3': 0.2531, 'Bleu_4': 0.178, 'METEOR': 0.1586, 'ROUGE_L': 0.4250, 'CIDEr': 0.49825, 'SPICE': 0.105}

and 5 epochs with SCST loss and metrics for test set {'Bleu_1': 0.5921191851384573, 'Bleu_2': 0.4097619608961199, 'Bleu_3': 0.28310403764494474, 'Bleu_4': 0.19849208973465082, 'METEOR': 0.17864354493376855, 'ROUGE_L': 0.44502137095314304, 'CIDEr': 0.5282577280042098, 'SPICE': 0.10780518852211546}

I have not changed any parameters in config file but I'm not able to get the best model results as provided in the paper. Please let me know what could have been wrong

LeXueTao commented 1 year ago

Hi, your XE cider score is so low,I use my code train swin features, xe cider score is almost 1.12.But it is also lower than the ResNext101 features which is almost to 1.5