JDAI-CV / image-captioning

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
268 stars 52 forks source link

Why you use different xe_type when facing with different model. #19

Closed deepSTEM closed 3 years ago

deepSTEM commented 4 years ago

When I compared performances of different models, I have noticed that you use 'CrossEntropy' when the model is xlan, and you use 'LabelSmoothing' when the model is xtransformer. Why do you differentiate them? Is it not proper to adopt 'LabelSmoothing' in xlan?

YehLi commented 3 years ago

The performance improvement of "LabelSmoothing" for xlan is not significant. One of the reasons might be the scheduled sampling , which mitigates the overfitting as well as the discrepancy between training and inference. Thus, CrossEntropy is used for xlan for simplicity.