agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.05k stars 150 forks source link

ProtT5 model generate #147

Open BSharmi opened 3 months ago

BSharmi commented 3 months ago

Hi there!

Was ProtT5 trained to predict just the masked position or the full sequence? When I use the generate function with masked sequences I noticed the model returns full sequence. Is it the default behavior for a model trained on mlm task?

thank you very much

mheinzinger commented 2 months ago

Hi :) yes, you are perfectly on spot: ProtT5 was trained on reconstructing the full sequence. Here is an example: https://github.com/agemagician/ProtTrans/issues/137#issuecomment-1817576165

BSharmi commented 2 months ago

Thank you!!