question about prot T5 denoising objective

agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.

Academic Free License v3.0

1.1k stars 152 forks source link

question about prot T5 denoising objective #93

Closed zzk1st closed 1 year ago

zzk1st commented 1 year ago

Thanks for the amazing work and releasing the code and the model!

I am trying to replicate prot T5, and in the paper it is mentioned that for pre-training, a BERT denoising objective is instead applied. So I assume for a sequence "AETC", suppose we are masking "E" and "C", the masked input could be like: A <extra_id_0> T <extra_id_0> [EOS]

My question is, what does the decoder's target output look like? Is it something just like "EC [EOS]" or it's like <extra_id_0> E <extra_id_0> C [EOS]

Thanks in advance for your help!

zzk1st commented 1 year ago

Sorry it's been a while since I posted this question. May I ask what exactly is the pre-training objective used here? Suppose the sequence is "AETC", and we've masked the sequence to A[mask]T[mask], which objective of the below is correct?

AETC
EC

Thanks in advance!

Plus, I found in HF's model page, it is mentioned that "The original T5-3B model was pretrained using a span denosing objective, while this model was pre-trained with a Bart-like MLM denosing objective. ". I am a bit confused by this, because Bart as far as I know is using a span-wise masking objective, am I understanding this correctly?

mheinzinger commented 1 year ago

Our model was still trained using span-generation, though with a span length of 1 (this is why we said it's similar to BERT's denoising). The example you give above is nearly correct, I just think that the special tokens change for every token/span you corrupt:

E C [EOS] The huggingface post here gives a good example: https://huggingface.co/docs/transformers/model_doc/t5 I think this paper makes a good point about the relationship of different pre-training scenarios and might be helpful: https://ai.googleblog.com/2022/10/ul2-20b-open-source-unified-language.html

zzk1st commented 1 year ago

Thanks for the reply! I learned from here that protT5 only use during pretraining, so I misunderstood how it worked. But now I see the point and many thanks for the clarification!