clarification on T5 pretraining

google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

https://arxiv.org/abs/1910.10683

Apache License 2.0

6.04k stars 751 forks source link

Open dorost1234 opened 3 years ago

dorost1234 commented 3 years ago

Hi,

I am trying to reproduce pretraining of mt5 model, when you modify the sentences as:

Thank you <X> to <Y> week => <X> for inviting me <Y> your party last <Z>

Then do you compute the loss on all tokens? BERT computes the loss only on the masked tokens. Could you clarify how this is done in T5?

Shall I compute the loss like normal feeding first masked sequence and then make the model predict the whole second masked sentence?

thanks

dorost1234 commented 3 years ago

@adarob @craffel thanks