Why you didn't apply scheduled sampling when training the xtransformer model？

JDAI-CV / image-captioning

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

268 stars 52 forks source link

Why you didn't apply scheduled sampling when training the xtransformer model？ #16

Closed jlxy closed 3 years ago

jlxy commented 4 years ago

Hello. I notice that you didn't apply scheduled sampling when training the xtransformer model. Could you tell me the reason? Thanks.

lezhang-thu commented 4 years ago

transformer comes from google's paper attention is all you need. there, no scheduled sampling is used either.

YehLi commented 3 years ago

Unlike RNN-based models, transformer is trained in parallel for all words. Therefore, scheduled sampling can't be applied to transformer directly. For more details, you can refer to the paper "Scheduled Sampling for Transformers".