Model names in "Models Availability" and "Original downstream Predictions" Tables

agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.

Academic Free License v3.0

1.13k stars 153 forks source link

Model names in "Models Availability" and "Original downstream Predictions" Tables #83

Closed balvisio closed 2 years ago

balvisio commented 2 years ago

Hi all, In the "Models Availability" the following models are listed: ProtT5-XL-UniRef50, ProtT5-XL-BFD, ProtT5-XXL-UniRef50, ProtT5-XXL-BFD, etc... In the "Original downstream Predictions" tables, the models are named: ProtT5-XL-UniRef50, ProtT5-XL-BFD, ProtTXL, ProtTXL-BFD, etc...

I am wondering, do the ProtTXL, ProtTXL-BFD in the "Original downstream Predictions" table correspond to the ProtT5-XXL-UniRef50, ProtT5-XXL-BFD in the "Models Availability"? Or do they correspond to ProtT5-XL-UniRef50, ProtT5-XL-BFD respectively?

Thank you!

mheinzinger commented 2 years ago

Hi,

yeah, we should probably improve model naming in the future... I absolutely see where the confusion comes from, sorry for that. So in brief:

ProtTXL is a (Transformer-XL)[https://arxiv.org/abs/1901.02860] based model (one of the first Transformers we trained). So this one is completely unrelated to any of the ProtT5-based models (neither the ProtT5-XL, nor the ProtT5-XXL model).
ProtT5-XXL is currently missing in the tables for Original downstream Predictions and I'll add it soon'ish If sth remained unclear, feel free to ask!

Final note: in our hands, ProtT5-XXL did not perform as well as ProtT5-XL despite their size differences (3B vs 11B). Most likely this is due to the fact that the larger model saw less samples during pre-training (one epoch took much longer for the large model). So we usually recommend to only use the ProtT5-XL-UniRef50 model (and very specifically, the encoder-side thereof). If ran in half-precision, we usually got fast, reliable predictions for all our downstream tasks, usually, on-par or above ESM-1b.

balvisio commented 2 years ago

Thank you very much for the clarification and updating the stats @mheinzinger ! Regarding your comment, IIUC ProtT5-XL was trained for a greater number of epochs than ProtT5-XXL? Was the training time-limited? Also, I would be interested in the hardware (e.g. #gpus, model) used to train them. Is there any public information about that?

Thanks again!

mheinzinger commented 2 years ago

Yes, ProtT5-XL saw more samples during training as it could process more samples/second due its smaller size (still: ProtT5-XL has 3B parameters, so not really small; ProtT5-XXL had 11B). We did simply use the compute that we had available (which was limited to a certain extent) which is why we could not afford to train the XXL-version for exactly the same number of steps as the XL version. For Hardware-details I would point you towards our manuscript: https://ieeexplore.ieee.org/document/9477085 We trained on TPU-Pod v3 (for more information, see Table 2 in the ProtTrans-paper). In SOM you'll also find more information on the exact training/hardware-setup.

balvisio commented 2 years ago

Thank you for the detailed info. Very interesting work! Couldn't find the SOM in the link above, is it already published?

mheinzinger commented 2 years ago

Yeah, good point. For some reason IEEE does an incredible job in hiding this. You need to click on "Media" and then you get a link to SOM: https://ieeexplore.ieee.org/ielx7/34/4359286/9477085/supp1-3095381.pdf?arnumber=9477085

balvisio commented 2 years ago

😂 Thank you!