Open dorost1234 opened 3 years ago
Hi,
I am trying to reproduce pretraining of mt5 model, when you modify the sentences as:
Thank you <X> to <Y> week => <X> for inviting me <Y> your party last <Z>
Then do you compute the loss on all tokens? BERT computes the loss only on the masked tokens. Could you clarify how this is done in T5?
Shall I compute the loss like normal feeding first masked sequence and then make the model predict the whole second masked sentence?
thanks
@adarob @craffel thanks
Hi,
I am trying to reproduce pretraining of mt5 model, when you modify the sentences as:
Thank you <X> to <Y> week => <X> for inviting me <Y> your party last <Z>
Then do you compute the loss on all tokens? BERT computes the loss only on the masked tokens. Could you clarify how this is done in T5?
Shall I compute the loss like normal feeding first masked sequence and then make the model predict the whole second masked sentence?
thanks