Closed zzk1st closed 2 years ago
Sorry it's been a while since I posted this question. May I ask what exactly is the pre-training objective used here? Suppose the sequence is "AETC", and we've masked the sequence to A[mask]T[mask], which objective of the below is correct?
Thanks in advance!
Plus, I found in HF's model page, it is mentioned that "The original T5-3B model was pretrained using a span denosing objective, while this model was pre-trained with a Bart-like MLM denosing objective. ". I am a bit confused by this, because Bart as far as I know is using a span-wise masking objective, am I understanding this correctly?
Our model was still trained using span-generation, though with a span length of 1 (this is why we said it's similar to BERT's denoising). The example you give above is nearly correct, I just think that the special tokens change for every token/span you corrupt:
Thanks for the amazing work and releasing the code and the model!
I am trying to replicate prot T5, and in the paper it is mentioned that for pre-training, a BERT denoising objective is instead applied. So I assume for a sequence "AETC", suppose we are masking "E" and "C", the masked input could be like:
A <extra_id_0> T <extra_id_0> [EOS]
My question is, what does the decoder's target output look like? Is it something just like "EC [EOS]" or it's like
<extra_id_0> E <extra_id_0> C [EOS]
Thanks in advance for your help!